Data Repositories & Metadata

Fields, Standards & Minimal Requirements

Maja Magel
Charlie Pauvert

2024-07-18

Learning objectives

  • recall at least two mandatory metadata fields
  • name at least one metadata standards used in the life sciences
  • identify relevant data repository

Metadata fields

Metadata fields

  • A column header expects one or more cell values
  • A metadata field expects one or more values
Metadata field Type of constraint
description free-text
geolocation coordinates
biome ontology term

Metadata standards

Metadata standards

  • Fields are organized into coherent metadata standard for a given type of data

    • e.g., genomes, soil samples
  • They are built and maintained by a combination of stakeholders

    • e.g., users community, data repositories

Standards expectations

Metadata standards (should) indicate for each field:

  • the description of the metadata field

  • the level of requirements (mandatory, recommended, optional)

  • the cardinality, that is the range of expected values for the metadata field

  • a persistent identifier for the field

Exercise: Metadata standards

Task:

Minimal requirements

  • The set of mandatory fields is sometimes referred to as the minimal requirements.

  • Filling out these requirements and all the optional metadata fields would be ideal (if only possible) but is time-consuming

Exercise: Requirements

Task:

  • Given only the mandatory fields, do you think you could recontextualise the data properly?
  • List your arguments in the pad.
Overview of mandatory fields for the ERC000013 metadata standard

ENA Browser for host-associated metadata requirements (ERC00013)

as much as possible, as little as necessary

Working FAIRly takes time and effort1

How much metadata is necessary to understand your research and to enable (inter-) disciplinary research?

Data repositories

Data repositories

  • Curated and well-described that stays on your hard drive is of limited interest to the scientific community.

  • National and international efforts exist to create and maintain data repositories for the life sciences.

  • For nucleotide sequence data, the INSDC integrate and mirrors data repositories from three regions:

    • USA with the NCBI

    • Europe with the EMBL-EBI

    • Japan with the DDBJ

Exercise: Data repositories

Task

  • Find out which repository could suit your data using the wizard

  • Report the repository in the pad

A wizard LEGO piece