Ontologies for biological data

Maja Magel
Charlie Pauvert

2024-07-18

Learning objectives

  • define an ontology and explain its features
  • name at least two widely used ontologies in the life sciences
  • navigate an ontology service and find specific ontology terms

Why bother with ontologies?

  • increase findability of your dataset
  • improve machine-readability of your datasets
  • help others correctly categorize & re-use your datasets > recontextualization
  • required by data repositories
  • first step towards open linked data and knowledge graph representations?

Having fun with ontologies

Let’s talk about…

Having fun with ontologies

Ontology definition

  • List of terms, usually taken from the scientific literature

  • Ontology terms:

    • have curated textual definitions and synonyms

    • are arranged in a hierarchy from general to specific

    • have defined relationships with others terms (e.g., is_a, has_condition)

    • have persistent identifiers

    • can be cross-referenced with other resources (ontology or not)

  • Ontology should reflect existing knowledge

Exercise: an ontology?

Task: Choose one of the definitions below and, if necessary add any missing features.

An ontology is:

  • A dictionary with persistent identifiers

  • A network of scientific definitions

  • A controlled vocabulary with synonyms

Search ontology terms

Search terms in ontologies

You can find terms in ontologies using the search bar of:

OLS search screenshot

NCBO BioPortal search screenshot

Ontology Lookup Service

OLS is the official ontology service of the EMBL-EBI

https://www.ebi.ac.uk/ols4

  • EMBL: European Molecular Biology Laboratory
  • EMBL-EBI: EMBL’s European Bioinformatics Institute

Demonstration: navigating OLS for term “intestines” in UBERON, ENVO

Ontology definition as demonstrated with the term “intestine”

Results, we found the term

intestine [UBERON:0000160]

in both ontologies UBERON and ENVO (as imported term from UBERON)

The correct writing convention for ontology terms and their term identifiers is: > term [ontology-acronym:sequence-number], e.g. intestine [UBERON:0000160]

Exercise: Ontology browser

Task: using Ontology Lookup Service v4

  • Look-up the following keywords in that order via the search bar: pond, ear and leaf

  • Select a term for each using these ontologies:

    • Uber-anatomy ontology (UBERON)

    • Plant Ontology (PO)

    • Environmental Ontology (ENVO)

  • Report the term and the term identifier in the pad

painting of a pond la grenouillere by auguste renoir

Exercise: visualize features

What questions do you have?

Learning objectives achieved?

  • define an ontology and explain its features
  • navigate an ontology service and find specific ontology terms
  • name at least two widely used ontologies in the life sciences

Exercise: Find your ontology terms & connect the dots

Can we start the next lesson on selecting ontology terms for YOUR dataset descriptions?

TASK: Pick any 3 terms from your dataset description.

  • Note them in the pad.
  • Browse the OLS and find 1-2 suitable ontology terms for each.
  • Add the ontology term identifiers to the pad.
  • You have 5 minutes.
  • Share your experiences.
  • Who used an ontology besides UBERON, ENVO and PO? And why?

Exercise: Find your ontology terms & connect the dots 2

Connect the dots to repository metadata fields

  • Select a suitable ENA checklist for your dataset and match your terms to one of the metadata fields.
  • Write down the checklist identifier & metadata field.
  • Are your terms fitting the metadata field requirements (field format, restriction)?
  • Does the info box tell you to use specific ontologies?
  • 10 more minutes

Environmental metadata according to established metadata standards

MIxS lists three mandatory environmental metadata fields that expect ontology terms.

Metadata field Abbreviation Definition
broad-scale environmental context env_broad_scale global correlation; ecosystem
local environmental context env_local_scale in local vicinity; causal influences
environmental medium env_medium immediate surroundings of your sample during sampling
Metadata field Abbreviation Recommended use of subclasses from
broad-scale environmental context env_broad_scale biome [ENVO:00000428]
local environmental context env_local_scale deeper hierarchy than broad-scale (UBERON terms accepted)
environmental medium env_medium environmental material [ENVO:00010483]

Exercise: Env* metadata

broad scale vs local env context

Task alone or by pairs:

  • Browse ENVO or UBERON (see previous table)

  • List ontology terms fitting your data

  • Fill out the following template on the pad:

    • env_broad_scale

    • env_local_scale

    • env_medium

  • trouble finding an appropriate term downstream of the recommended class? See instructions of using other ontologies with the MIxS standard

References

Leonelli, Sabina. 2016. Data-Centric Biology: A Philosophical Study. Chicago ; London: The University of Chicago Press.
Osumi-Sutherland, David, Nicole Vasilevsky, Alex Diehl, Nico Matentzoglu, Matt Brush, Matt Yoder, Carlo Toriniai, et al. 2023. “Introduction to Ontologies.” https://oboacademy.github.io/obook/explanation/intro-to-ontologies/#key-features-of-well-structured-ontologies.