Glossary

Allele

A given gene will always encode the same protein, but in different cells there may be variations in the DNA sequence which result in variations in the protein produced. Each of these alternative forms of the gene is known as an allele.

Amino acid

Amino acids are the building blocks of proteins. Each amino acid is coded for by one, or several, codons. There are twenty-two different amino acids found in bacteria. 

Bacteria

(singular: a bacterium)

Bacteria are a large group of single-cell microorganisms. They were among the first life forms to appear on Earth, around 3 billion years ago, and are found in most habitats. In the Genome Detectives project, you are helping us investigate bacteria that cause human disease!

Bases

See Nucleotides.

Chromosome

A chromosome is a long string of DNA containing most, or all, of the genes required to build a given organism. Most bacteria have a single chromosome with a circular shape. 

Codon

A codon is a term for three consecutive bases that code for an amino acid. An example is ATG, which codes for an amino acid called N-formylmethionine in bacteria – this amino acid is commonly found at the start of bacterial proteins.  

See Start codon and Stop codon.

Curator

A curator looks at DNA sequences and works out where in the sequence a gene starts and ends, and if whether there is a new allele present which has not been found before. They act as ‘Genome Detectives’!

Defined allele

A specific sequence of bases that a curator has assigned as a variant (an allele) of a certain gene.

DNA

DNA (short for deoxyribonucleic acid) is a long string-like molecule made up of subunits called nucleotides. DNA is a ‘code’ containing the instructions to make molecules such as proteins, which come together to make an organism.  

Gene 

A gene is a length of DNA that encodes a specific protein. You may come across the term 'hypothetical' gene or protein when studying bacterial genomics, meaning the function of a gene or protein has not been confirmed experimentally.

See Locus.

Genome

A genome is all the genetic information (the DNA) found in an organism. 

Genomic data

Genomic data is data related to the content, structure, and function of an organism’s genome.

In vivo

In vivo testing refers to looking at the scenario in the organism, in this case bacteria.

Isolate

A pure bacterial (or viral) sample of a disease which has been obtained from an infected patient.

Locus

(plural: loci)

A locus is the specific place on a chromosome where a given gene is located. You may also see ‘locus’ and ‘gene’ used as interchangeable terms.

Metadata

Metadata is “data about data”. In the context of Genome Detectives, the key data is the genome sequence of a bacterial isolate and the metadata is associated information such as the country and year the isolate was collected in.

Nucleotides

Nucleotides are the building blocks of DNA. They are made up of three molecules: phosphoric acid, a sugar called deoxyribose (a different type of sugar to the one we have in tea!), and a base. The base is the main part so you may see nucleotides referred to as just ‘bases’. There are four different bases that a nucleotide can have (adenine, thymine, cytosine, and guanine) which are referred to by the letters A, T, C, and G.

Protein

A biological molecule made up of amino acids. Proteins perform a huge range of essential functions in living things.

Reading frame

A reading frame is a way of dividing the sequence of DNA bases into consecutive non-overlapping triplets (codons), in order to translate the DNA into a protein. If you start reading the DNA at the first position, you are looking at ‘Reading Frame 1’. If you start reading the DNA from the second base along, you are looking at ‘Reading Frame 2’. If you start from the third base along, you are looking at ‘Reading Frame 3’. In general, only one of the reading frames will result in a functional protein when the DNA is translated. 

Sequence

The specific order of the bases on a strand of DNA.

Start codon

A start codon is the three bases found at the start of a gene. The bacterial cell machinery will recognise this codon and start translating the DNA from this point onwards in order to make a protein. In bacteria, the most common start codon is ATG. Other bacterial start codons include CTG, TTG, or GTG.

Stop codon

A stop codon is the three bases found at the end of gene. The bacterial cell machinery recognises this codon and this causes it to stop reading and translating the DNA. The stop codons found in bacteria are TTA, TGA, and TAG. 

Translation

The process reading a gene sequence and following the ‘instructions’ encoded by the DNA in order to build a protein.