Metroline Step: Pre-FAIR Assessment

STATUS: IN DEVELOPMENT

Short Description

In this pre-FAIRification phase you assess whether your (meta)data already contains FAIR features, such as persistent unique identifiers for data elements and rich metadata, by using FAIRness assessment tooling [Generic].

By quantifying the level of FAIRness of the data based on its current characteristics and environment, the assessment outcomes can help shape the necessary steps and requirements needed to achieve the desired FAIRification objectives [FAIRInAction].

The how-to section describes a variety of assessment tools based on the FAIR principles.

[Mijke: RDMkit has a page on this → https://rdmkit.elixir-europe.org/compliance_monitoring#how-can-you-measure-and-document-data-management-capabilities ]

Why is this step important

This step will help you assess the current FAIRness level of your data. Comparing the current FAIRness level to the previously defined FAIRification objectives will help you shape the necessary steps and requirements needed to achieve your FAIRification goals [FAIRInAction].
Furthermore, the outcomes of this assessment can be used to compare against in the Assess FAIRness step to track the progress of you data towards FAIRness . [Hannah; copied from above]

Expertise requirements for this step

This section could describe the expertise required. Perhaps the Build Your Team step could then be an aggregation of all the “Expertise requirements for this step” steps that someone needs to fulfil his/her FAIRification goals.

[Hannah: I would say expertise depends a bit on which tool you use; most checklists and questionnaires are pretty low effort and self-explanatory. But some of the more automated tools require some (programming) skills]

How to

There are many assessment tools to do a pre-FAIR assessment of your (meta)data. FAIRassist holds a (manually created) collection with various different tools. These include manual questionnaires or checklists and automated tests that help users understand how to achieve a state of "FAIRness", and how this can be measured and improved. Furthermore, a 2022 publication (FAIR assessment tools: evaluating use and performance) compared a number of tools. Of these and the tools listed on FAIRassist, we suggest that the following can be considered for your pre-FAIR assessment:

[Hannah: Fieke pointed out something important; there are basically two kinds of tools for the FAIR assessment. One group assesses (often in a semi-automated way) the FAIRness of (meta)data which already has a persistent identifier (such as a DOI). The other group assesses FAIRness (often in the form of a survey, questionnaire or checklist) of (meta)data without persistent identifier.]

Online self-assessment surveys

These tools allow you to fill in an online form. The result of the survey can be e.g. a score to indicate the FAIRness of your (meta)data. Some tools additionally provide advice on how to improve FAIRness at different levels.

Tool	Description - Nakijken of de paper iets moois heeft staan
ARDC FAIR self assessment	Provided by Australian Research Data Commons, this 12-question online survey provides a visual indication on the FAIRness level of your (meta)data and provides resources on how to improve it.
FAIRaware	Provided by DANS, this online survey gives a FAIRness score. Furthermore, it provides advice on how to improve the FAIRness of your (meta)data. [Hannah; according to the review paper, this tool ‘assesses the user's understanding of the FAIR principles rather than the FAIRness of his/her dataset. FAIR-aware is not further considered in this paper’. Maybe throw it out as well?]
SATIFYD	Provided by DANS, this online survey gives a FAIRness score. Furthermore, it provides advice on how to improve the FAIRness of your (meta)data.
FAIRshake	Allows you to automatically assess digital objects as well as add a new project to their repository (??) it seems to (automatically??) check digital objects. Is the survey automatically filled? Also has a Chrome browser plugin to automatically check FAIR assessments for available projects [Hannah; I don’t know how useful this is in the context of our metroline; also the paper states it as quite a time investment and ]

Online (Semi-) automated

These tools do an automatic assessment by reading the metadata available at a certain URI.

Ammar, A. et al. [Hannah: this links to another page on the confluence] and this one [Hannah; these are jupyter notebooks to use for data from specific databases; can be extended/adjusted with your own dataset; it seems a bit of a larger effort to use for a ‘quick’ FAIR assessment of your (meta)data]
FAIR evaluator software
FAIRchecker; this tool automatically provides a score for all aspects of FAIR from a URI

Offline self-assessment

RDA-SHARC Simple Grids
GARDIAN (link from paper is dead, could be somewhere around here, can’t find it though)

[Hannah: the 2022 paper does not recommend using offline tools, and I kind of agree. So maybe we don’t include this category at all? Especially because one of the links is dead anyway..]

The FAIR Data Maturity Model

FAIR assessment tools vary greatly in their outcomes. The FAIR Data Maturity Model (created by the Research Data Alliance, or RDA) aims to harmonise outcomes of FAIR assessment tools to make these comparable. Based on the FAIR principles and sub-principles, they have created a list of universal 'maturity indicators'. Their work resulted in a checklist (with extensive description of al maturity indicators), which can be used to assess the FAIRness of your (meta)data.

FAIR maturity evaluation system

FAIR Implementation Profiles (FIPs)

Potentially: compare the community FIP with your own fingerprint. This gives an indication on whether you meet R1.3?

‘The FAIR Principle R1.3 states that “(Meta)data meet domain-relevant Community standards”. This is the only explicit reference in the FAIR Principles to the role played by domain-specific communities in FAIR. It is interesting to note that an advanced, online, automated, FAIR maturity evaluation system [22] did not attempt to implement a maturity indicator for FAIR Principle R1.3. It was not obvious during the development of the evaluator system how to test for “domain-relevant Community standards” as there exists, in general, no venue where communities publicly and in machine-readable formats declare data and metadata standards, and other FAIR practices. We propose the existence of a valid, machine-actionable FIP be adopted as a maturity indicator for FAIR Principle R1.3.’

[Hannah] There are also these tools [Mijke: these are the ones Nivel used in a recent project - have them write the community example?]:

FIP Mini Questionnaire from GO-FAIR: https://www.go-fair.org/how-to-go-fair/fair-implementation-profile/fip-mini-questionnaire/

Data Maturity Model: https://zenodo.org/records/3909563#.YGRNnq8za70

[Mijke: RDMkit has a page on this → https://rdmkit.elixir-europe.org/compliance_monitoring#how-can-you-measure-and-document-data-management-capabilities ]

[Sander]

FAIRCookbook

Assessment Chapter in the FAIRCookbook. It currently has recipes for two tools (no idea how they work yet):

FAIR assessment tools: evaluating use and performance paper (2022):

reviews ten FAIR assessment tools that have been evaluated and characterized using two datasets from the nanomaterials and microplastics risk assessment domain.
we evaluated FAIR assessment tools in terms of 1) the prerequisite knowledge needed to run the tools, 2) the ease and effort needed to use them and 3) the output of the tool, with respect to the information it contains and the consistency between tools. This should help users, e.g., in the nanosafety domain, to improve their methods on storing, publishing and providing research data. To do this we provide guidance for researchers to pick a tool for their needs and be aware of its strong points and weaknesses.
The selected tools were split up into four different sections, namely online self-assessment/survey, (semi-)automated, offline self-assessment and other types of tools. The tool selection was based on online searches in June 2020.
They compare:
- Online self-assessment survey
  - FAIRdat (I think development stopped on this one); you get stars (1 to 5) for the letters of FAIR, but cannot safe the results of the assessment
  - FAIRenough? (was replaced by FAIRaware)
  - ARDC FAIR self assessment
  - FAIRshake
  - SATIFYD
- Online (Semi-) automated
  - Ammar, A. et al. and this one
  - FAIR evaluator software
- Offline self-assessment
  - RDA-SHARC Simple Grids
  - GARDIAN (link from paper is dead, may be somewhere around here, can’t find it though)
- Other
  - Data Stewardship Wizard

More Checklists and tools:

A Checklist produced for use at the EUDAT summer school to discuss how FAIR the participant's research data were and what measures could be taken to improve FAIRness:
- https://zenodo.org/records/1065991#.Xs_XpC2cbOQ%C2%A0 [Hannah; this is also an offline checklist; not sure if we should recommend to consider. I also think it is rather limited compared to the rest of the tools/checklists]

[Sander]

Hannah mentions the Data Maturity Model. This is also here on FAIRplus. There is also this Github from FAIRplus and the sheet for the actual assessment is here. Could be worrying: last update was last year.

[Hannah: I cannot really find a clear description on how to use it, only a huge excel file (for which you have to dig quite deeply into the GitHub, maybe we can link to it here if we include it https://github.com/FAIRplus/Data-Maturity/tree/master/docs/assessment ?]

Related: in the FAIRtoolkit they describe Data Capability Maturity Model:

Most recently, CMM has been adapted by the FAIRplus IMI consortium [7] to improve an organisation’s life science data management process, which is the basis for the method described here.
The FAIR data CMM method identifies 1) important organisational aspects of FAIR data transformation and management, 2) a sequence of levels that form a desired path from an initial state to maturity and 3) a set of maturity indicators for measuring the maturation levels.
- e.g. Findability Maturity Indicators. Also describes some team requirements.

[Hannah; I think this is also more about assessing FAIR in an organization?]

Furthermore: FAIR Evaluator (FAIRopoly and FAIR Guidance) – text copied below.

FAIRopoly

As a task under the objectives of the EJP RD, we created a set of software packages – The FAIR Evaluator – that coded each Metric into an automatable software-based test, and created an engine that could automatically apply these tests to the metadata of any dataset, generating an objective, quantitative score for the ‘FAIRness’ of that resource, together with advice on what caused any failures (https://www.nature.com/articles/s41597-019-0184-5). With this information, a data owner would be able to create a strategy to improve their FAIRness by focusing on “priority failures”. The public version of The FAIR Evaluator (https://w3id.org/AmIFAIR) has been used to assess >5500 datasets. Within the domain of rare disease registries, a recent publication about the VASCA registry shows how the Evaluator was used to track their progress towards FAIRness (https://www.medrxiv.org/content/10.1101/2021.03.04.21250752v1.full.pdf). To date, no resource – public or private – has ever passed all 22 tests, showing that FAIR assessment is able to provide guidance to even highly-FAIR resources.
The FAIR evaluation results can serve as a pointer to where your FAIRness can be improved.

[Hannah; this tool is included in the section above already: semi automated tools]

FAIR Guidance [https://www.ejprarediseases.org/fair_guidance/]

FAIR Assessment Tools

There is growing interest in the degree to which digital resources adhere to the goals of FAIR – that is, to be Findable, Accessible, Interoperable, and Retrievable by both humans and, more importantly, by machines acting on behalf of their human operator. Unfortunately, the path to FAIRness was left undefined by the original FAIR Principles paper, which chose to remain agnostic about which technologies or approaches were appropriate. As such, until recently, it has been impossible to make objectively valid statements about the degree to which a data object exhibits “FAIRness”.

With the encouragement of journal editors and other stakeholders who have a need to evaluate author/researcher claims regarding the FAIRness of their outputs, a group consisting of FAIR experts, journal editors, data repository hosts, internet researchers, and software developers assembled to jointly define a set of formal metrics that could be applied to test the FAIRness of a resource. The first edition of these metrics was aimed at self-assessment, in the form of a questionnaire; however, upon review of the validity of several completed self-assessments by data owners, we determined that the questions were often answered inconsistently, or incorrectly (knowingly or unknowingly), and often the data provider did not know enough about the data publishing environment to answer the questions at all. As such, a smaller group of FAIR experts created a second generation of FAIR Metrics that aimed to be fully automatable. The result was a set of 22 Metrics spanning most FAIR principles and sub-principles, which explicitly describe what is being tested, which FAIR Principle it applies to, why it is important to test this (meta)data feature, exactly how the test will be conducted, and what will be considered a successful result.

As a task under the objectives of the EJP RD, we created a set of software packages – The FAIR Evaluator – that coded each Metric into an automatable software-based test, and created an engine that could automatically apply these tests to any dataset, generating an objective, quantitative score for the ‘FAIRness’ of that dataset, together with advice on what caused any failures (https://www.nature.com/articles/s41597-019-0184-5). With this information, a data owner would be able to create a strategy to improve their FAIRness by focusing on “priority failures”. The public version of The FAIR Evaluator (https://w3id.org/AmIFAIR) has been used to assess >5500 datasets. Within the domain of rare disease registries, a recent publication about the VASCA registry shows how the Evaluator was used to track their progress towards fairness (https://www.medrxiv.org/content/10.1101/2021.03.04.21250752v1.full.pdf). To date, no resource – public or private – has ever passed all 22 tests, showing that FAIR assessment is able to provide guidance to even highly-FAIR resources.

Generic

If driving user question(s) were defined in Step 1 it should be “answered” in this step. The results of these question(s) are gathered by processing the FAIR machine-readable data. If RDF is the machine-readable format used, then RDF data stores (triple stores) are used to store the machine-readable data, and SPARQL queries are used to retrieve the data required to answer the driving user question(s).

FAIRCOOKBOOK recipe: [https://faircookbook.elixir-europe.org/content/recipes/introduction/fairification-process.html]

Phase 3: assess, design, implement, repeat

Following the selection of the “action” team, an iterative cycle of assessment, design, and implementation in put in place.

Assessment : Prior to starting the work, the assessment of goals is done to ensure that individuals in the action team are updated and clear with the FAIRification goals formulated by the data owners. This assessment is carried out by review team which could be an independent team or certain individuals from the technical team who are not involved in the action team. The assessment involves a binary decision of “GO” or “NO GO” based on the FAIRification goals and the catalog provided. At this stage, the reviews can also provide suggestion based on their experiences on the resources, tool, or goals.

Design : Once the team receives a “GO” decision from the review team, the action team now starts by enlisting the steps that need to be done performed to achieve the goal. For each task, the resources, an estimate time duration, as well as the responsible person is selected.

Implementation : Once the tasks have been selected and assigned, the actual work begins. To ensure that the action team is working smoothly, weekly or bi-weekly meetings is recommended so that the team is aware of the progress.

Once the implementation of task listed in the design phase are done, the action team assess the work done and checks the aligned with the FAIRification goal. In case more tasks are needed to achieve the goal, a second round of the assess-review-implement cycle takes place as described above with the starting point as the FAIRification goals, the completed tasks and the proposed task

This phase is usually run in short sprints of 3-month.

Practical Examples from the Community

This section should show the step applied in a real project. Links to demonstrator projects.

[Mijke: Nivel has done a pre-assessment in a recent project - have them write the community example? The ZonMw program have written FAIR Improvement Plans, we can contact some of those and ask for example]

[Hannah - copied from the Define FAIR objectives Metroline step]

Amsterdam University of Applied Sciences have a “FAIR enough checklist”. They describe it as follows:

The first checklist describes the minimum effort for Urban Vitality (UV) research projects and can be applied by researchers with minimal assistance from a data steward. Following this checklist makes the research data quite FAIR to people and somewhat FAIR to machines (computers). The checklist should be used immediately after obtaining research funding.
Source: https://www.amsterdamuas.com/uv-openscience/toolkit/open-science/fair/fair-data.html
Checklist: https://uvaauas.figshare.com/articles/online_resource/Urban_Vitality_FAIR-enough_checklist/20178863/1?file=36079901

References & Further reading

[FAIRopoly] https://www.ejprarediseases.org/fairopoly/

[FAIRinAction] https://www.nature.com/articles/s41597-023-02167-2

[Generic] https://direct.mit.edu/dint/article/2/1-2/56/9988/A-Generic-Workflow-for-the-Data-FAIRification

Authors / Contributors

Experts who contributed to this step and whom you can contact for further information