FAIR Data Train general architecture

Working Draft,

This version:
https://specs.fairdatatrain.org/v1.0
Latest version:
https://specs.fairdatatrain.org
Previous Versions:
Date of the first draft:
14 December 2021
Feedback:
l.o.boninodasilvasantos@utwente.nl
Reference Implementation:
https://github.com/FAIRDataTeam/FAIRDataTrain
Issue Tracking:
GitHub
Editors:
Luiz Olavo Bonino (University of Twente, Leiden University Medical Center, GO FAIR International Support and Coordination Office)
Kees Burger (Leiden University Medical Center)
Rajaram Kaliyaperumal (Leiden University Medical Center)

Abstract

This document describes the general architecture of the FAIR Data Train (FDT). The architecture includes the FDT's the core components, metadata structure and schema, API and validation rules.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by any organization. Don’t cite this document other than as work in progress.

This document was created by the FAIR Data Team.

1 Introduction

The FAIR Data Train (FDT) comprises a set of types of applications to support the findability, accessibility, interoperability and reusability of data following the FAIR principles [FAIR-principles]. The FDT follows the analogy of a train system where we have two main elements, stations and trains. Stations are responsible of making data (or other types of digital objects) available and provide metadata about itself (the Station) and its content (the data or other types of digital objects). Trains represent analysis/processing algorithms that visit stations to process and analyse data. Additionaly, a Station Directory can be consulted to verify which station provide which data. The Station Directory is a special kind of Station which indexes the metadata of other stations and provide search capabilities.

In this document, we consider data not only the content of artefacts such as databases, tables, graphs, etc., but also other types of digital objects such as controlled vocabularies, ontologies, models, etc. From now on, unless otherwise explained we use the terms data and other types of digital objects interchangeably. The FAIR Data Train architecture defines a number of key elements to support:

The main goal of the FDT general architecture is to define a set of behaviours, interfaces and protocols to improve interoperability among data sources and data processing services. To fulfill this goal, this document contains a set of specifications to help developers to build new applications or to extend the functionality of their existing applications in a way that these applications can also be part of the FAIR Data Train ecosystem. The enviosined scenario is the one with a multitude of trains, stations and client applications independently created and able to interact with one another because they all follow the same base guidelines (interfaces, protocols, representation formats, etc.).

1.1 Purpose

The purpose of this document is to present the general architecture of the FAIR Data Train. This document includes requirements, architecture, design principles and design of the FDT. This architecture is primarily intended to be a reference for developers willing to add the FDT functionality into their existing applications or develop their own FAIR Data Train implementation. In order to better understand this specification, a knowledge of RDF, LDP, SHACL and REST APIs is required.

1.2 Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

2 Main components

As depicted in Figure 2.1, the main elements (applications, roles and interaction mechanisms) of the FAIR Data Train are:

Highlevel architecture view

Figure 2.1 High-level architecture of the FAIR Data Train

2.1 FAIR Data Station

Since the purpose of the FAIR Data Train architecture is to define a set of desired behaviours that applications should expose and support, the definition of the FAIR Data Station (FDS) should specify the station's interface. For simplicity, we first start by dividing the FDS API in following three major groups, namely, Metadata Interface, Station Interface and Content Interaction Interface, as depicted in Figure 2.2. Each one of these interface groups are intended to expose the interfaces of a number of services available at the FDS.

FDS interface groups

Figure 2.2 Interface groups of the FAIR Data Station

In this Figure 2.2, it is also made explicit that the interaction with the data happens through the FDS' Interaction Component, which connects with the actual Data Storage component.

2.1.1 Metadata Interface

2.2 Station Directory

2.3 Personal Gateway

2.4 Train Handler