Number crunching the very stuff of life

Introduction to Bioinformatics. Second Edition - Bioinformatics - Bioinformatics and Molecular Evolution. First Edition

二月 24, 2006

These three textbooks concern bioinformatics. Since the subject is still relatively new, some introduction may be worthwhile.

Sixteen years ago, I was asked by my then boss, Barry Holland, to organise a public meeting to celebrate the 25th anniversary of Leicester University's genetics department. I hired Leicester's stately De Montfort Hall, invited some of the top speakers from the world of genetics, including our own Alec Jeffreys and Baroness Warnock, and got our chancellor, Lord Porter, to co-chair. However, pride of place went to James Watson, co-discoverer of the structure of DNA. Come the day, I hoped for the best. Remarkably, the public turned up, about 1,500 of them, my boss was happy, and my bacon was saved.

Watson talked about the (then) fledgling Human Genome Project and all the politics and logistics that surrounded this massive enterprise. As with the space programme and its technological spin-offs, the genome project would spur, said Watson, the development of faster, cheaper DNA sequencing methods as well as the new computational methods, algorithms and data structures required to access and analyse such vast amounts of information, and these advances would stimulate molecular research worldwide. To decipher the sequence of the 3 billion letters that encode a human being would, he argued, herald the dawn of a new age in biology.

I must say that some of us professional geneticists were a bit sceptical. Perhaps the zillions of dollars being spent on this project would be better used on more hypothesis-driven research, we thought. What did we know?

In 2001, the draft human genome was published simultaneously by two consortia - one commercial, led by Craig Ventner of Celera Genomics Corporation, and the other headed by John Sulston, who organised a conglomerate of international laboratories funded from the public purse - in the journals Science and Nature respectively. It would be an understatement to say that there was some tension between the groups. Nevertheless, this epic feat was almost complete, and the data could now be mined. However, the human genome was neither the first to be sequenced - that honour belongs to Haemophilus influenza , which had its genome sequenced in 1995 - nor was it the last: new genomes are being added to the archives every couple of months or so. Genome sequencing projects have revolutionised the work of molecular geneticists, who can now map human, fly, mouse or worm genes more easily, or compare DNA, RNA and protein sequences between species to gain remarkable insights into the evolutionary process.

So how do we deal with, and make sense of, some 100 billion base pairs that now exist in databases, or the protein and RNA sequences that this DNA encodes? There are clearly major job opportunities here for computer geeks, mathematicians and statisticians; enough to keep them going for decades. Universities are seldom slow to spot a gap in the market, and MSc courses (as well as undergraduate courses) in "bioinformatics" have popped up all over the UK and elsewhere.

To service these courses, a large number of specialised texts have hit the shelves over the past ten years or so. Two are in their second editions now: Introduction to Bioinformatics by Arthur Lesk and Bioinformatics: Sequence and Genome Analysis by David Mount. They are both nicely presented and cover much the same areas, although the latter, with almost twice as many pages (and larger, glossier ones at that), is clearly the more substantial. Briefly, these areas include an historical introduction, DNA, RNA and protein multiple alignments, plus structure predictions, phylogenies, proteomics, transcriptomics, systems biology, databases, computer languages and statistical analyses. Both have a companion website, although I must say that the one associated with Lesk's text seems more useful, both for the student (it gives hints to answer the problem sets) and for the lecturer (it provides all the figures in the book, so no scanning is required to make presentations), plus web links and recommended reading.

While in my view neither book is suitable for a first-year undergraduate, both would serve for MSc and higher level honours. Both juxtapose the computing and statistical/mathematical work with the biology nicely, without intimidating the biology student, who, let's face it, is studying biology because he or she hates mathematics and couldn't get into medical school. Computing students, however, may struggle if they do not have some background in genetics and/or biochemistry.

The third text, Bioinformatics and Molecular Evolution by Paul Higgs and Teresa Attwood, while covering most of the basics mentioned above, focuses more on molecular evolution and population genetics. Thus there are additional chapters on models of sequence evolution that are rather more statistical, and the whole book relies on a more stringent mathematical framework than the other two. It is more suitable for postgraduates or keener final-year undergraduates and has a very good website for both student and lecturer.

I would recommend either of the first two books for my second-year bioinformatics course for genetics honours students. Lesk's is cheaper and more compact, but both go very easy on the mathematically/computationally incompetent. As for Higgs and Attwood's book, this would be fine for my MSc bioinformatics course, particularly as the MSc has a strong evolution/population genetics flavour. So I am spoilt for choice.

Charalambos Kyriacou is professor of behavioural genetics, Leicester University.

Introduction to Bioinformatics. Second Edition

Author - Arthur M. Lesk
Publisher - Oxford University Press
Pages - 360
Price - £24.99
ISBN - 0 19 9787 7

请先注册再继续

为何要注册?

  • 注册是免费的,而且十分便捷
  • 注册成功后,您每月可免费阅读3篇文章
  • 订阅我们的邮件
注册
Please 登录 or 注册 to read this article.