What is curation?

In the study of genomes, curation is the process of assigning new alleles to a gene and annotating features of the sequence such as start and stop codons.

 

The PubMLST database has highly efficient built-in computer programmes for automatically annotating genes and assigning alleles, based on their similarity to sequences that have been ‘seen’ in the database before. This means that 97% of the genomic data uploaded to the PubMLST database is curated automatically. However, new alleles that have not been identified before are discovered all the time, and these are not detected automatically if they are outside carefully chosen thresholds. There is a risk that important information is lost if these new gene variants are not identified, so manual curation is required.

 

Curators look at new DNA sequences, compare them to existing sequences, and make a judgement on whether a new allele of the gene is present.

 

With thousands of bacterial genomes being sequenced worldwide every day, each of which contains thousands of genes, there is a very large amount of data crunching to do! There is a limit to how much data scientists can analyse on their own, which is where Genome Detectives comes in. A ‘community curation’ approach, where interested and enthusiastic people beyond the scientific community help with curation, will massively increase the amount of genome data we can explore. The work of the Genome Detectives community will contribute to exciting scientific discoveries that will help improve people’s health worldwide!