Brussels, 25 Oct 2005
The recently launched METAFUNCTIONS project is intended to develop a data-mining system that correlates genetic patterns in genomes and metagenomes with contextual environmental data.
Deciphering the genetic blueprint of all microbial species may tell scientists which species are present in a certain environment and how they work together, allowing them to unearth thousands of previously unknown micro-organisms, as well as new enzymes and proteins for medical and industrial use and ways of harnessing bacteria to fight pollution.
Determining the complete DNA sequence of a single species has become common - it has been achieved for humans, mice, rice plants and many microorganisms. In the last seven years, more than 260 microbial genomes have been successfully sequenced, while over 600 are currently in progress.
Researchers know that up to 99 per cent of microorganisms cannot be studied using traditional DNA extraction methods. However, it is possible to extract DNA from a sample of soil or seawater. So new techniques need to be used to access this hidden diversity.
The techniques seek to read all DNA in the bacterial communities found in a patch of soil or seawater, or even the lining of the human gut. Sequences of these samples are known as metagenomes - not the genome of an organism, but the genetic blueprint of a particular environment. The essence behind these methods is to extract the DNA from a given environment and then insert a fragment of this DNA into an expression plasmid, creating a 'metagenomic' library. The final step is to functionally assay the expressed DNA for a multitude of activities.
A wealth of metagenome information is emerging - but the tools to analyse it are seriously lacking. The task is not easy: there can be thousands of different microbial species in a spoonful of soil or seawater, meaning that, in a genetic sense, such a sample can be more complex than the human genome. Besides, research on metagenomics has so far largely focused on bacteria that are medically important, whereas 'environmentally important' organisms (e.g. those involved in methane production and consumption) have not received the same attention.
Funded by the European Commission under the NEST - Newly Emerging Science and Technology programme of the Sixth Framework Programme (FP6), the METAFUNCTIONS project aims at addressing these challenges by developing a novel data-mining system that can identify relationships between sequenced genes and their environmental and ecological context.
Coordinated by the Max Planck Institute for Marine Microbiology, Germany, the project integrates a diverse range of expertise in bioinformatics, computer science, geographical information systems and marine sciences from four European research centres from Germany, Switzerland and Poland.
The ultimate aim of this project, whose official title is 'environmental- and metagenomics - a bioinformatics system to detect and assign functions to habitat-specific gene patterns', is to determine the function of as yet unknown genes, known as 'hypothetical genes'. The innovative combination of expertise has the potential to produce a technology with broad application and high potential pay-off. The project is building a 'Genomes MapServer' that will soon allow scientists around the world to access integrated genomic and ecological data and clearly visualise the results of their analyses.
METAFUNCTIONS is using natural language processing techniques to collate literature data and convert them into a structured, database format. The project also relies heavily on data-mining techniques to identify novel or interesting patterns in genomic data.
Another innovative aspect of this project is the use of geographic information systems (GIS). GIS tools provide for the simulation and analysis of events from a geographical or spatial perspective. Novel patterns - for example, the physical clustering of genes within a genome - will be correlated to the contextual habitat data. For instance, a particular cluster of genes may be found in a number of genomes and metagenomes, all taken from high-temperature environments. It would be reasonable to infer that the gene must play some role in enabling survival in extreme heat.
In particular, the METAFUNCTIONS project will help to break through the current backlog in assigning function to the vast number of conserved hypothetical genes that high-throughput genomic sequencing has produced. Marine ecology, biotechnology, medicine and many industrial sectors could all benefit from the mapping that METAFUNCTIONS will give to ecological genomics.
The potential pay-off of the project is huge: bacteria constitute more than half of the living matter on Earth, and play essential roles in numerous environmental cycles. They turn nitrogen in the air into a form usable by plants, produce about half the oxygen on the planet, break down minerals and clean up pollution.