Metroline Step: Query (use) over resources

STATUS: FUTURE WORK

Short description

Once machine-readable (meta)data is exposed, it can be used. Triple stores, such as GraphDB [GraphDB] and Blazegraph [Blazegraph], provide a SPARQL endpoint. This endpoint allows the (meta)data stored in the triple store to be queried using the SPARQL query language, provided the user has been granted access[De Novo]. Queries can be performed in a variety of ways, for example via a form provided by the endpoint or programmatically using e.g. JavaScript, C#, Java, or Python. Query results can be returned in a variety of formats, such, such as HTML, JSON, XML and CSV. Furthermore, since the introduction of SPARQL 1.1, federated querying is supported [w3_sparql]. With federated querying a user can direct a portion of a query to a particular SPARQL endpoint. Results are returned to the federated query processor, which combines the results.

ToDo: I think this needs something about the FDP and triple store

--

FAIRopoly

The Virtual Platform (VP) aims to bring together data from Rare Diseases resources, which is by nature scattered, scarce, and usually isn’t connected to other data sources. The VP is a federated ecosystem, meaning that resources are made available through multiple query points that allow for the answering of questions that require information from various data sources. This way, it makes the data more FAIR while respecting privacy, consent, and access conditions, since the data stays at source level, but can be queried through the VP.

It is envisioned that each registry will constitute a node in the EJP RD VP network. It will be possible to send queries to all the nodes in the network via the VP. Each node stays in control to who can access the data and in which format (yes/no answers, aggregated, anonymised, pseudonymised).

De novo

Step 15 - Query over FAIR data point(s)

The machine-readable data is stored in a triple store and can, therefore, be queried using the query language SPARQL by users with access to the data (described in step 14). Query results can be displayed in multiple formats (e.g. JSON, XML, CSV or TSV). The SPARQL endpoint of the EDC system can be queried by using external SPARQL clients or by using a web-based version that is available in Castor EDC’s FAIR Data Point. Currently, the web-based version can only query within a single database. Federated queries, therefore, need to be performed with external clients. These (federated) queries allow researchers to ask questions to the FAIR VASCA registry as well as other FAIR RD registries and data resources (multi-source analysis of FAIR data).

Generic

If driving user question(s) were defined in Step 1 it should be “answered” in this step. The results of these question(s) are gathered by processing the FAIR machine-readable data. If RDF is the machine-readable format used, then RDF data stores (triple stores) are used to store the machine-readable data, and SPARQL queries are used to retrieve the data required to answer the driving user question(s).

Nice read: https://data.persee.fr/understanding/what-is-a-triplestore/?lang=en

[Blazegraph] https://blazegraph.com/

[eu] https://data.europa.eu/data/datasets/eu-open-data-portal-sparql-endpoint

[De Novo] https://ojrd.biomedcentral.com/articles/10.1186/s13023-021-02004-y

[FAIRopoly] https://www.ejprarediseases.org/fairopoly/

[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification

[GraphDB] http://graphdb.ontotext.com/

[w3_sparql] https://www.w3.org/TR/sparql11-federated-query/

Why is this step important

By completing this step you will know how your FAIR data can be used in practicse.

How to

Some libraries / clients for programming languages

https://help.poolparty.biz/en/developer-guide/basic---advanced-server-apis/poolparty-s-sparql-endpoint/available-sparql-clients.html

Making an omics data matrix FAIR à FAIRifying Data Matrices - Step3 - Exploring data with SPARQL

(Python)

https://faircookbook.elixir-europe.org/content/recipes/applied-examples/fair-data-matrix/2-rose-metabolites-Python-RDF-querying-analysis.html

The How to section should:

be split into easy to follow steps;
- Step 1
- Step 2
- etc.
help the reader to complete the step;
aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;
be a general, widely applicable approach;
if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;
aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;
contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;
contain custom recipes/best-practices written by/together with experts from the field if necessary.

Expertise requirements for this step

Describes the expertise that may be necessary for this step. Should be based on the expertise described in the Metroline: Build the team step.

Practical examples from the community

Examples of how this step is applied in a project (link to demonstrator projects).

Training

Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”

Suggestions

Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.