Skip to content

⚙️ Digital Infrastructure Components

The CARBO-DIOL 2.0 digital infrastructure is constructed upon a secure and standardized framework, vital for managing sensitive industrial and academic data. This section details the four foundational components that ensure the data is Findable, Accessible, Interoperable, and Reusable (FAIR) across the partnership.

1. Data Sharing Platform: Security and Functionality

The platform is the operational cornerstone, designed to manage both the confidentiality of intellectual property (IP) and the practical aspects of data exchange.

Security Requirements

Given the participation of industry partners like BASF SE, security is paramount. The platform must adhere to stringent industrial security protocols, including robust encryption standards, strong user authentication, and advanced Data Loss Prevention (DLP) capabilities to protect sensitive information. The chosen solution, BASF SE Sharepoint, provides this required level of security, ensuring all collaborators have access under strictly controlled, auditable conditions. This commitment to security is non-negotiable for building trust within the industry-academia alliance.

Functional Requirements

Beyond security, the platform must facilitate efficient research. Key functional requirements include:

  • User and Access Management: Precise control over who can view, upload, or modify specific data sets.
  • Data Ingestion and Metadata Capture: Streamlined tools for uploading data while immediately capturing mandatory metadata.
  • Discoverability and Querying: Features that allow researchers to easily find relevant data through standardized metadata fields.
  • Interoperability: The ability to link with external NFDI4Cat services and analytics tools.

2. Standardized Terminology: Voc4Cat

To enable true machine-readability and seamless data integration, human language ambiguity must be eliminated.

  • Solution: Voc4Cat (Vocabulary for Catalysis)
  • Purpose: Voc4Cat is the first comprehensive, community-developed vocabulary specifically for catalysis research. It serves as an authoritative dictionary to unambiguously define concepts, parameters, and entities (e.g., reaction conditions, catalyst morphology). By enforcing the use of standardized terms from Voc4Cat, we ensure that data generated at different institutions is semantically aligned, directly addressing the Interoperability principle of FAIR data. This effort is crucial for validating and combining disparate datasets.

3. Metadata Standards: CatCore

Metadata—or "data about data"—is what makes scientific information discoverable and reusable.

  • Solution: CatCore
  • Definition: CatCore is a standardized collection of relevant metadata fields, directly based on the terms defined in Voc4Cat. It outlines the minimum information required for any catalysis experiment to be considered complete and scientifically sound.
  • Impact: Utilizing CatCore enables automatic data validation upon upload, simplifying the process of querying the database, and ensuring the overall high quality of the data pool. Researchers can swiftly identify key experimental parameters without needing to review the raw data file, vastly improving efficiency.

4. Data Framework: LinkML and DCAT-AP4Cat

This component handles the complex, technical steps required to move from structured data (like CSV or JSON) to semantically rich formats.

  • Role as Mediator: The framework, utilizing tools like LinkML, acts as a mediator, allowing us to define a unified schema (the precursor to CatCore) that can accept input from various file types.
  • Semantification: It is responsible for semantification, the process of transforming non-semantic formats (like spreadsheets) into RDF (Resource Description Framework) data, which is essential for advanced data linkage and ontology mapping.
  • DCAT-AP4Cat: The framework also extends DCAT-AP4Cat, ensuring the system’s metadata catalogue aligns with national and European standards for public data repositories, thereby boosting the Findability of the research output.

Image Suggestion: I recommend inserting a four-part diagram illustrating the Digital Infrastructure Stack. The diagram should show the flow from the base layer up: 1. Voc4Cat (Terminology) $\rightarrow$ 2. CatCore (Metadata Schema) $\rightarrow$ 3. LinkML/Framework (Mediator) $\rightarrow$ 4. BASF Sharepoint (Platform). This clearly shows the dependency of each component.