FAIR Data Train general architecture

Working Draft,

This version:
https://specs.fairdatatrain.org/v0.1
Latest version:
https://specs.fairdatatrain.org
Previous Versions:
Date of the first draft:
14 December 2021
Feedback:
l.o.boninodasilvasantos@utwente.nl
Reference Implementation:
https://github.com/FAIRDataTeam/FAIRDataTrain
Issue Tracking:
GitHub
Editor:
Luiz Olavo Bonino (University of Twente, Leiden University Medical Center)
Contributors:
Kees Burger (Health RI)
Rajaram Kaliyaperumal (Janssen Pharmaceuticals)

Abstract

This document describes the general architecture of the FAIR Data Train (FDT). The architecture includes the FDT's the core components, metadata structure and schema, API and validation rules.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by any organization. Don’t cite this document other than as work in progress.

This document was created by the FAIR Data Team.

1 Introduction

The FAIR Data Train (FDT) is a FAIR-based platform framework that aims at guarateeing a specific level of interoperability among its participant elements/components. It defines a set of application types and their expected behaviors to support the findability, accessibility, interoperability and reusability of data following the FAIR principles [FAIR-principles]. The FDT follows the analogy of a train system where we have two main elements, stations and trains. FAIR Stations are responsible of making data (or other types of digital objects) available and provide metadata about themselves (the Stations) and their content (data or other types of digital objects). FAIR Trains represent analysis/processing algorithms that are sent to FAIR Stations to process and/or analyze data. Additionally, a Station Directory is a special kind of FAIR Station which indexes the metadata of other stations and provide search capabilities. Therefore, Station Directories can be consulted to verify which station provides which data.

In this document, we consider data not only the content of artefacts such as databases, tables, graphs, etc., but also other types of digital objects such as controlled vocabularies, ontologies, models, etc. From now on, unless otherwise explained we use the terms data and other types of digital objects interchangeably. The FAIR Data Train architecture defines a number of key capabilities:

The main goal of the FDT general architecture is to define a set of behaviours, interfaces and protocols to improve interoperability among data sources and data processing services. To fulfill this goal, this document contains a set of specifications to help developers to build new applications or to extend the functionality of their existing applications in a way that these applications can also be part of the FAIR Data Train ecosystem. The envisioned scenario is the one with a multitude of trains, stations and client applications independently created and able to interact with one another because they all follow the same base guidelines (interfaces, protocols, representation formats, etc.).

1.1 Purpose

The purpose of this document is to present the general architecture of the FAIR Data Train. This document includes requirements, architecture, design principles and design of the FDT. This architecture is primarily intended to be a reference for developers willing to add the FDT functionality into their existing applications or develop their own FAIR Data Train implementation. In order to better understand this specification, a knowledge of RDF, LDP, SHACL and REST APIs is required.

1.2 Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

2 Main components

As depicted in Figure 2.1, the main elements (applications, roles and interaction mechanisms) of the FAIR Data Train are:

Highlevel architecture view

Figure 2.1 High-level architecture of the FAIR Data Train

2.1 FAIR Data Station

Since the purpose of the FAIR Data Train architecture is to define a set of desired behaviours that applications should expose and support, the definition of the FAIR Data Station (FDS) should specify the station's interface. For simplicity, we first start by dividing the FDS API in following three major groups, namely, Metadata Interface, Station Interface and Content Interaction Interface, as depicted in Figure 2.2. Each one of these interface groups are intended to expose the interfaces of a number of services available at the FDS.

FDS interface groups

Figure 2.2 Interface groups of the FAIR Data Station

In this Figure 2.2, it is also made explicit that the interaction with the data happens through the FDS' Interaction Component, which connects with the actual Data Storage component.

2.1.1 Metadata Interface

The Metadata Interface complies with the FAIR Data Point specifications.

TODO: add details from the FDP specs.

2.1.2 Station Interface

The Station Interface is repos

2.1.3 Content Interaction Interface

2.2 Station Directory

2.3 Personal Gateway

2.4 Train Handler