Status | ||||
---|---|---|---|---|
|
Short description
Panel | ||
---|---|---|
| ||
'Metadata is the descriptor, and data is the thing being described '(https://doi.org/10.1162/dint_r_00024 ) |
Metadata refers to the contextual information about a resource (e.g. a dataset), often described as “data about data”. This mMetadata Metadata can come in many different types and forms. Perhaps t The type of metadata you might be most familiar with is the generic descriptive metadata often collected from in repositories (e.g., such as Zenodo (see the example of how zenodo describes the resources on its repository). This generic metadata includes details about on what the resource is about (e.g., data from patient health records), who created it (e.g., a research team at Radboudumc) and when it was collected (e.g., 2023). Typically, it also discloses information about the possible uses of the resource (e.g., applicable licensing) and access restrictions (e.g., available for public use/restricted access). There are other Other types of metadata , below a non-exhaustive listcommonly used are:
Provenance metadata: This refers to how the resource came to be, what protocols were followed, and what tools were used. The purpose of this metadata is to ensure that you, your colleagues or others can reproduce the initial research.
Content Structural metadata: Depending on the type of resource, this refers to a detailed description of your resource that goes beyond the generic information explained above. For instance, in the context of a dataset containing data collected from a questionnaire, content metadata could include the questions asked and the allowed range of values.
Codebooks: A detailed document that provides information about the structure, content, and organization of a dataset. A codebook usually describes information such as variable names, and measurement methods and units.
In this step, the focus will be on assessing the availability of your metadata. This involves identifying and collecting all types of metadata being gathered about for your resource. Check , checking their quality, and ensure ensuring they are as accurate and complete as possible. Depending on your objectives <point towards FAIR objectives>, this step is a good starting point. Whether you aim to simply gain a clear view of what metadata currently describes your resource, expand your current metadata, ensure compliance with requirements to publish it in a metadata catalogue <Point to Register resource level metadata> or follow a semantic model to describe your metadata; this step is common across multiple purposes.In this step, the focus will be on assessing the availability of your metadata. This step is a good starting point and a common first step for multiple objectives <point towards FAIR objectives>objectives (see also the Metroline Step: Define FAIRification objectives), whether you aim to:
gain a clear view of what metadata currently describes your resource
expand your current metadata
ensure compliance with requirements to publish it in a metadata catalogue <Point to (see also the Metroline Step: Register resource level metadata>metadata )
follow a semantic model to describe your metadata
This step involves identifying and collecting all types of metadata gathered for your resource, checking their quality and ensuring they are as accurate and complete as possible.
Metadata is data about data. It comes in many types, such as descriptive metadata, provenance metadata, etc [cb_metadata]. Metadata helps people to locate the data and allows it to be reused and cited [GoFAIR]. Furthermore, metadata can be machine-actionable, allowing for automation of data handling and validation [RDMKit_MachineActionable]. Findability, accessibility and reusability of data can be improved by providing metadata with details about license, copyright, etc., as well as description of use conditions and access of data [Generic].
Maybe more how to:
Check whether metadata regarding regarding findability, accessibility, and reusability is already available and whether this metadata is already being collected using standardized vocabularies [Generic, FAIRopoly]. What metadata should be gathered may depend on the stakeholder community [Generic].
...
your
...
metadata
...
Why is this step important
...
To be able to
with respect to HRI:
Health-RI is in the process of defining a metadata scheme for onboarding in the Health-RI metadata portal. To allow for onboarding of a dataset, the minimal metadata set must be provided. It is therefore essential that you assess whether this minimal set is collected/available or whether additional metadata needs to be collected.
Beneficial for You and Your the appropriate and correct metadata.
Furthermore it is:
Beneficial for you and your team: Having comprehensive and detailed metadata ensures that anyone, including yourself, can understand and work on the data effectively even when some time as has passed since collection. This is an example of good data management practices and contributes to data remaining usable and meaningful over time and saves time when setting up new projects.
Beneficial for the organisation: Complete and error-free metadata makes it easier for organisations to migrate information about its projects between systems, especially when newer software versions are available.
Promotes higher research impact: Good metadata records reflect well on the researcher’s outputs. Potential data reusers might be put off by documentation issues and may not be inclined to use the data.
Beneficial for the organisation: well curated metadata increases the reuse of datasets. It increases interoperability between systems: Complete and error-free metadata makes it easier to migrate between systems (when newer (versions) of software are available)
Good image: Good metadata records reflects well as reusers of the data might be put off by documentation issues and might not use the data as much (Ig also for researchers?)
Improves the quality of your data: Good metadata should describe the data accurately and unambiguously, which in turn improves the overall quality of the data and enhances transparency and reproducibility. This enables others to verify results and build upon them.
Helps with data discovery: Complete metadata improves the ability for you and your team to locate and retrieve data quickly. Additionally, if this metadata is published, it can more easily boost reuse of data, lead to new collaborations with others.
Promotes good data management practices: ??
How to
step 1: where is metadata from your research already being collected (ensure it’s still up to date and represents still your project accurately)
step 2: Do you think your metadata is still enough for others to understand? Create competence questions to this metadata. Guide yourself with specific questions (think about it).
Step 3: Answer all those competence questions
Step 4: Store the metadata in an appropriate location where it can be useful for you and other people on your team - ask your data stewards which location this is.
In the RUMC researchers can put documentation about their project in the RDR under a Research documentation collection - this is not meant to share with the public!
Step 5: What else can you do from here, link to following pages
Publish it in a data catalogue - if you want others to find the dataset?
If you want to start a metadata schema because or reuse one - step
Expand?create a sunflower - if you want to work with HRI to create a petal
[FAIRopoly] → Doesn’t really sound like “assess” though?
Usually, terms from upper ontologies can be used to describe metadata. For example, use dcat:Dataset from Data Catalog Vocabulary (DCAT) [DCAT] to describe the type of any rare disease dataset and dct:creator from DCMI Metadata Terms (DCT) to indicate the relationship between a dataset and its creator.
The How to section should:
be split into easy to follow steps;
Step 1
Step 2
etc.
help the reader to complete the step;
aspire to be readable for everyone, but, depending on the topic, may require specialised knowledge;
be a general, widely applicable approach;
if possible / applicable, add (links to) the solution necessary for onboarding in the Health-RI National Catalogue;
aim to be practical and simple, while keeping in mind: if I would come to this page looking for a solution to this problem, would this How-to actually help me solve this problem;
contain references to solutions such as those provided by FAIR Cookbook, RMDkit, Turing way and FAIR Sharing;
contain
custom recipes/best-practices written by/together with experts from the field if necessary.
Expertise requirements for this step
This section could describe the expertise required. Perhaps the Build Your Team step could then be an aggregation of all the “Expertise requirements for this step” steps that someone needs to fulfill his/her FAIRification goals. and enhance recognition of existing work.
Complies with funders’ and journals’ requirements: Many funding agencies and publishers now require metadata to be published to increase the efficiency and visibility of the research they support.
Info |
---|
Regarding the National Health Data catalogue: Health-RI is in the process of defining a metadata scheme for adding metadata ( |
How to
Step 1: Identify where information about your resource is stored
Start by considering where information about your resource is already contained. Typically, institutions have systems that require a certain level of documentation. Investigate these systems.
Example: Eva, a researcher at Radboudumc, wants to assess what metadata is available about her project. She starts by consulting her Data Management Plan (DMP). She then remembers that she added metadata about her project to the PaNaMa registry and the Radboud Data Repository.
Step output: Systems and documents identified, where metadata are stored (for instance the DMP, Research Management system such as PaNaMa, and (local) data repositories).
Step 2: Extract and evaluate your metadata
Once you've identified where your metadata might reside, it's time to extract and evaluate it. Errors and inconsistencies can naturally creep into your records over time, especially when many people are involved. Guidelines and project contexts can also change. This step helps ensure that the metadata is still understandable and accurate. Use these questions to guide you:
Are there typos in the metadata?
Is there missing information due to accidents, or omissions?
Are vocabularies used properly? Is the language outdated or not accurate anymore?
Are metadata terms used consistently? (e.g., Radboudumc vs rumc)
Example: After reviewing her metadata across various platforms, Eva realizes some information is outdated. The abstract of her Data Management Plan no longer aligns with her adjusted research question. Her data collection protocol has changed due to a new data collection system recently implemented by Radboudumc. She also notices that the PaNaMa entry has many blank recommended fields, and the Radboud Data Repository keywords include terminologies that might not facilitate discoverability of her resource (e.g., by using the term "neoplasm" instead of "cancer" or "tumor" more widely searched as keywords). Additionally, terms like "gender" and "sex" are used interchangeably across the descriptions in all those systems.
Step output: A list of identified issues in the metadata to be resolved/updated.
Step 3: Make the necessary corrections
Tip: Prioritize the systems with the highest impact. While assessing metadata is beneficial, it might require organizational support and can be labor-intensive, especially if you're involved in multiple complex projects.
Example: Eva decides to update her Data Management Plan because it's crucial for her PhD thesis. She also updates and fills out missing fields in the Radboud Data Repository to make her dataset available for reuse by others.
Step output: Metadata is updated, based on step 2 output.
You are now ready to take the next step with your metadata:
Share or publish your metadata: Metroline Step: Register resource level metadata
Expand your metadata to include domain specific metadata: Domain-specific metadata schema development
Step 4 (Bonus Step!): Enhance Your Metadata
Consider what else might be missing from your metadata. Is it sufficient for others to understand the context of your resource and how to use it? The FAIR data principles suggest describing your resource with various attributes to help others find potential uses that you might not be aware of. Think about the questions your current metadata can't answer and consult your data steward for solutions, if needed.
Example: Eva collects a lot of data from questionnaires but doesn't know how to include them in the metadata. This information could help others discover her dataset based on specific questions (e.g., whether participants smoke) and understand the possible values and the presence or not of missing data (e.g., incomplete diagnosis dates).
Expertise requirements for this step
Experts that may need to be involved, as described in https://health-ri.atlassian.net/wiki/spaces/FSD/pages/273350662/Metroline+Step+Build+the+Team , are described below.
Data manager/Data steward/Data librarian, Researcher (Scientist)or someone else who knows the context and content of the project.
Practical examples from the community
This section should show the step applied in a real project. Links to demonstrator projects.
Training
Add links to training resources relevant for this step. Since the training aspect is still under development, currently many steps have “Relevant training will be added in the future if available.”https://carpentries-incubator.github.io/scientific-metadata/instructor/data-metadata.html#metadata
https://howtofair.dk/how-to-fair/metadata/#what-are-metadata
Suggestions
Visit our How to contribute page for information on how to get in touch if you have any suggestions about this page.
...