Step 3. Domain analysis
status: in development
Short description
Before you start modeling, check what schemas, models, vocabularies and standards are already used within your domain that could be reused or built upon. Also try to identify if similair initiatives are carried out internationally. Together with the user requirements integrate your findings in a metadata inventory: a spreadsheet with all terms, definitions, mapped concepts, value ranges and controlled vocabularies. This inventory can be compared with the Health-RI core and generic health metadata schema to analyze what is already covered in there. This step is important before starting to model: it prevents reinventing the wheel, and ensures your metadata schema is efficient, using established best practices and aligning with international standards. It also reduces complexity and improves interoperability across datasets and systems in your domain.
Deliverables
Deliverable | Description |
---|---|
Review report | A report summarizing existing standards and schemas reviewed. |
Inventory | Spreadsheet with all terms, definitions, mapped concepts, value ranges and controlled vocabularies. |
How
1. Searches
Perform a literature search
Before developing a new metadata schema for your domain, start by performing a literature search to see if there are any existing solutions or guidelines in your domain that you can incorporate and reuse. This includes looking for research papers, reviews, or technical reports that discuss metadata standards or best practices for data management in your specific field or domain. In domains like omics, a lot of effort has already gone into standardizing metadata, so it's likely that some relevant frameworks exist. Some of the most well-known platforms to search are Google Scholar, PubMed and arXiv. Keywords to use in your search might include "metadata schema," "ontology," "vocabulary," "metadata standards," along with your specific domain, such as "genomics," "imaging," "oncology," etc.
Involve experts from other domain projects, they may already have metadata schemas for your domain
Reaching out to domain experts and researchers who may have already worked on similar projects could be very beneficial. They may have insights into existing metadata schemas, vocabularies, or ontologies that aren't widely published but are used within specific communities. Collaboration can help you ensure that your schema is compatible with existing standards and practices as well as to avoid reinventing the wheel, in case someone worked on a similar problem in the past.
You could involve the experts in different ways, some of these may include:
Reaching out to communities or working groups: Join communities such as the Research Data Alliance (RDA), where working groups focus on creating, maintaining, and updating domain-specific standards.
Consult with data stewards: Data stewards within you domain or organization often have experience implementing existing metadata schemas and can provide valuable advice. In the Netherlands, there is an active Data Stewards Interest Group (DSIG) to whom you may reach out to.
Engage with domain-specific initiatives: Projects like ELIXIR (a European intergovernmental organization for life sciences data) often collaborate on metadata standards across multiple (life science) disciplines. They offer access to domain experts and resources you may be able to leverage.
Examples of expert groups:
The Global Alliance for Genomics and Health (GA4GH): Works on developing standards and frameworks for sharing genomic and clinical data. They provide key guidelines like Phenopackets, a standard for sharing phenotypic data.
NIH Common Data Elements (CDE): Experts involved with NIH CDE projects focus on creating consistent data elements for clinical and translational research, which could overlap with your work.
You can find experts through:
Professional networks like LinkedIn, where you can search for professionals in your domain and metadata.
Conferences and workshops where metadata and data standardization are frequently discussed. For example, nationally these topics are often covered by events organized by Open Science NL or the Thematic Digital Competence Centers. Internationally you could think of the SWAT4HCLS Conference or ELIXIR’s BioHackathon Europe.
Search ontology lookup services for existing ontologies or standards
To find if there are any existing ontologies or standards already describing your domain, you can browse different ontology lookup services. Some of them are BioPortal, Ontobee, Ontology Lookup Service, Open Biological and Biomedical Ontology Foundry (OBO Foundry).
More about selecting an ontology lookup service can be found here: Selecting an ontology lookup service and about selecting the ontologies and terminologies here: Selecting terminologies and ontologies
2. Review report and inventory
The review report and inventory could go hand-in-hand. The report is an outcome in which the results of the searches are summarized. The inventory is a spreadsheet with all terms, definitions, mapped concepts, value ranges and controlled vocabularies identified. It can also have a column where is marked whether a specific term falls under the generic Health-RI metadata schema or not. In this spreadsheet similar terms could already be grouped together and prioritized, as it forms the main basis for the modeling part in the next step (for an example see here).
HRI hub involvement in this step
Health-RI can help connecting with previous projects of similar themes.