The Journal of Machine Learning Research was founded in 2001, in its words "as an international forum for the electronic and paper publication of high-quality scholarly articles in all areas of machine learning". Machine learning is a fascinating and important area of computer science: basically, can computers learn? It is a theoretical and practical area, where computer programs are developed and run on data to see what they can learn. There are enormous and widespread practical applications for machine learning, from labour negotiation, medical treatment, agriculture, to generally making computers more "intelligent". The slant of the journal will make it interesting for theoreticians, biologists and psychologists interested in how animals and humans learn and what the theoretical limits to learning are. There are even applications in machine learning for countering terrorism. And as the worldwide web fills up with vast amounts of unstructured information, we need all the help we can get to learn how to use it effectively. It says something about the relevance of machine learning that one managing editor of this journal works at Google and the other has published papers on financial markets.
Machine learning makes a difference, and makes a lot of money worldwide. Yet the JMLR has a free website, and it costs only $75 (£47) - $111 outside the US - for an annual individual print subscription, a fraction of the cost of subscribing to a conventional science journal. The journal runs like a collective, with MIT Press taking just the paper print rights, so costs are minimised. Turnaround time for authors is dramatically reduced. If somebody, say, in the third world wants to know anything up to date and rigorous about machine learning, it is the definitive place to reference. This is an excellent model for all journals to copy, especially in science. As one of the editors says: "What is the role of the scientist in academic publishing? Doing the publishing!"
The bulk of the journal's papers are devoted to discussing and evaluating learning methods. I was interested to see how ideas talked about in the journal actually worked, because that is really the whole point. So, as the journal is available online, I looked at every paper and then emailed the authors to ask them about their ideas. After a few weeks I had more than 100 replies. I drafted this review, and then bounced it off the editorial board and the authors again. The enthusiasm of authors for their work was impressive: I had replies covering every paper published.
I asked whether the system described in each paper was available. Of course, some papers were theoretical; I had a few replies saying my question was irrelevant. Of the remaining, about a third specifically said their systems were unavailable. Their systems were private, commercial confidential or incomplete in some way. Consider some of the replies I got:
"Unfortunately, I do not have the system in a state where I can give it away right now" and "we don't have the data ready to be published". Other quotes are revealing about authors' attitudes. "The system is a research prototype developed in my group, and is not appropriate for public dissemination" and "the implementations we had were very much 'research code', and not suitable for public consumption".
My survey suggests some authors have a relaxed regard for scientific virtues: reproducibility, testability and availability of data, methods and programs - the openness and attention to detail that supports other researchers. It is a widespread problem in computer science generally. I am guilty, too. We programmers tend not to keep the equivalent of lab books, and reconstructing what we have done is often unnecessarily hard. As I wrote elsewhere (see www.uclic.ucl.ac.uk/harold/warp ) there can be real problems with publishing stuff that is not rigorously supported. It is the computer-science equivalent of fudging experimental data - whether this really matters for scientific progress is another controversial debate.
Then there is the problem of who owns the work. As one author puts it: "We have not had the time to turn our experimental code into something other people can use (and anyway our employers wouldn't like to see things given away)." Certainly there needs to be a balance between science and protecting intellectual property. It is a big problem, as turning research ideas into code that really works might involve a company that then owns it. On the other hand, there is no reason why open-source code cannot be made freely and immediately available, at least to the depth to which the ideas are discussed in the papers. And it is possible: look at sites such as the GNU-licensed open-source Weka machine-learning project ( www.cs.waikato.ac.nz/ml ), which provides a framework in which people can share work. Many other sites have papers, code, demos and data too.
The Journal of Machine Learning Research does try to encourage authors to add electronic appendices with source code, data, demonstrations: anything, as the journal puts it, that will make life easier or more interesting for readers and researchers who follow in the authors' footsteps. Some authors do an excellent job, but spreading the good practice is an uphill struggle.
Machine learning will change our uses of computers dramatically, so let us hope the journal achieves its goals with more and more success.
Harold Thimbleby is director of the University College London Interaction Centre.
Journal of Machine Learning Research
Editor - Leslie Pack Kaelbling
ISBN - ISSN 1532 4435; E-ISSN 1533 7928
Publisher - MIT Press
Price - Instits. $375 (e-$337) and indivs. $75 (e-$67) +$36 outside US
Pages - (eight times a year)