Metroline Step: Data access and retrieval

status: future work

‘Start with a great quote from, for example, a paper, between single quotes, in italic.' (source as a hyperlink between parenthesis)

In layman’s terms (Jip en Janneke), add an easy to follow summary, using around three sentences.

Short description

If you do not yet have the data which you aim to FAIRify, the first step is to acquire the data responsibly and efficiently. This includes identifying how to access it, how to retrieve it, and how to ensure it meets legal, ethical, and technical requirements. This page outlines the steps and considerations involved in acquiring data..

https://static-content.springer.com/esm/art%3A10.1038%2Fs41597-023-02167-2/MediaObjects/41597_2023_2167_MOESM1_ESM.pdf table 2:

Get the data
Data can be accessed in a variety of ways, including:
1. Through Application Programming Interfaces (developing FAIR APIs for data access: FCB073)
2. From controlled access repositories
3. Via secure file transfer mechanisms (e.g. transfer with SFTP: FCB014, ), download with Aspera: FCB015)
4. After fulfilling ethical, legal, or contractual agreements
Data retrieval
Effective retrieval often depends on understanding:
1. The query languages (e.g. SQL, SPARQL: FCB040)
2. The format of results (e.g. CSV, JSON, RDF)
3. Tools or portals that support exporting or transforming data
4. Hier ben ik niet bekend mee: Results representation, FCB040, FCB046, FCB060, FCB070

As explained by RDMKit, there are many aspects to consider when transfering data. Life Sciences often generate massive amounts of data, such as digital images and output from “omics” techniques. Such datasets cannot simply be sent via email and require a different approach. For example, to transfer such data, you could consider usage of Cloud Storage Services offered by the data owner’s institute, usage of secure File Transfer Protocols to transfer files and usage of checksums to verify the data’s integrity. Furthermore, rules and legislation, such as the GDPR, may require specific measures to be taken before data can be transferred. For example, you may have to establish a data processing agreement, before you can transfer the data.

Why is this step important

You need the data to be able to FAIRify it. By completing this step ensures you:

Know where and how to get the data
Access it legally, securely, and ethically
Are prepared to begin the FAIRification process

How to

Step 1 – Identify the Data Source

Locate a trusted source (e.g. data repository, electronic patient files, biobank) that holds the dataset of interest.

Step 2 – Determine Access Requirements

Check if access is open, restricted, or controlled. You may need:

Institutional approval
Ethics committee clearance
Data Use Agreement (DUA) or Data Transfer Agreement (DTA)

Step 3 – Choose a Retrieval Method

Select an appropriate method:

API (automated and scalable)
Web interface (manual), e.g.:
- Genomic data: GEO (Gene Expression Omnibus)
- Biobank metadata: BBMRI-ERIC Directory
- Genomic + phenotype data: dbGaP
- General research data: Zenodo
File transfer (secure and authenticated), e.g.:
- SURFfilesender

Step 4 – Retrieve the Data

Use tools or protocols recommended by the data provider. Consider:

File format and structure
Size and speed of transfer
Retry/resume capabilities (e.g. for large medical or omics datasets)

Step 5 – Validate and Store Safely

Verify data integrity (e.g. with checksums), and store it in a secure, access-controlled environment with appropriate metadata.

Useful resources:

Expertise Requirements

You may need access to or support from:

Data Stewards
Legal and ethical advisors (for GDPR/ethics)
IT professionals (for secure storage/transfer)
Domain experts (to interpret and validate data)

Refer to the Metroline: Metroline: Build the team guide for role descriptions and team structure advice.

Originele inhoud:

[HANDS] (Not sure, “acquisition” is a pretty broad term)

Why should I consult an expert about data acquisition techniques?

You can use a variety of techniques to generate data. Familiarity with one technique does not necessarily make that technique the best for your particular study. You should consult experts to make sure you make a good choice.

[RDMKit_DataTransfer]

There’s some nice information here. It’s a bit too much to copy-paste it. Could be a great basis for how-to.

[FAIRInAction]

Get the data

Data access Considerations relating to how data is accessed, eg through APIs, via controlled access
- Transferring data with SFTP: FCB014
- Downloading data with Aspera: FCB015
- Developing FAIR API for the Web: FCB073
Data retrieval Considerations relating to data retrieval, eg query language, results representation and exporting capabilities
- Exploring data with SPARQL: FCB040
- Identifier resolution services: FCB046
- Registering Datasets in Wikidata: FCB060
- FAIR and Knowledge graphs: FCB070

Note: Don’t think all are relevant for what we’re trying to do here…

The How to section should:

be split into easy to follow steps;
- Step 1 - Title of the step
- Step 2 - Title of the step
- etc.
help the reader to complete the step;
aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;
be a general, widely applicable approach;
if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;
aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;
contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;
contain custom recipes/best-practices written by/together with experts from the field if necessary.

Expertise requirements for this step

Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.

Practical examples from the community

This section should show the step applied in a real project. Links to demonstrator projects.

Training

Relevant training will be added soon.

Suggestions

This page will be developed in the future. Learn more about the contributors here and explore the development process here. If you have any suggestions, visit our How to contribute page to get in touch.