Effective use of machine learning to empower your research
Artificial intelligence, or machine learning, can support complex analysis and advance quality research, but only when used carefully. John F. Wu shares advice on how machine learning can empower researchers
You may also like
Artificial intelligence (AI) – or machine learning – seems to be everywhere these days. If you’re a researcher, you’ve probably seen these terms pop up more and more in your field’s academic literature. But how much of this is actually useful? Should you also be leveraging machine learning?
In this article, I'll describe a few cases of when machine learning is useful for research – and also when it isn’t – by drawing inspiration from my own field in astronomy.
Machine learning delivers the most value for “data-driven” research problems: when you have so much data that you can’t inspect it manually. In these scenarios, machine learning can lighten your workload and allow you to focus on your area of research. However, adopting machine learning is not without its pitfalls and hidden costs.
- Applying machine learning without thinking can result in some dangerous analyses. For example, deep neural networks are able to memorise the data they’ve seen, causing unpredictable behaviour when handling new data. In a similar vein, many machine learning algorithms underperform or completely fail when applied to new domains. Machine learning is also susceptible to biases and selection effects inherent to their training data. Finally, machine learning may not be able to distinguish important features from confounding variables. Your expertise in a specialised field can help you recognise and avoid these common pitfalls.
- Some machine learning algorithms have a steep learning curve. You’re already conducting research in one field, so it can seem like a lot to learn an entirely new discipline. Simply learning the machine learning jargon can be a big hurdle, but fortunately there are many resources for getting started in this field (eg, Fastai). Many concepts in machine learning have analogues in other fields – for example, model optimisation can be reframed in the language of thermodynamics and statistical physics. Also, there are many subdisciplines within machine learning, and you probably don’t want to spend all of your time exploring these different rabbit holes.
- Just because it can be done with machine learning doesn’t mean that it should be. When fancy new algorithms appear, it is always exciting to see them applied to your favourite research problems. But at some point, we need to move on from the proof-of-concept phase to the value-adding phase. In other words, you can ask yourself, “If I didn't use machine learning, would this result still be interesting?”
When applied carefully, through a sceptic’s lens, machine learning can enable research programs that would otherwise be infeasible. Broadly speaking, machine learning can empower researchers in four ways.
1. Make predictions based on trends
Sometimes you want to know if your dataset can be used to determine something else. For example, you may have heard about how machine learning in medicine can help doctors screening for cancer. In my field of astronomy, it is fairly simple to take images of millions of galaxies, but we have traditionally needed to take and analyse specialised observations in order to understand the details of how galaxies evolve. By using machine learning, my collaborators and I found that we could actually study these galaxies solely using images.
It’s easy to create new models of how things should behave, but the real test of any model is whether it has any predictive power. By identifying connections within your data, you can formulate a model – and machine learning can too. Scientists have used machine learning to summarise these connections into the language of mathematics and uncover a new formula that explains the distribution of matter on cosmic scales.
2. Spot outliers
If machine learning can be used to find the typical trends, then perhaps it’s not surprising that machine learning is also great at detecting anomalous things. Many research fields can benefit from a thorough investigation of rare phenomena, and machine learning can help you spot the “needle in the haystack”. In astronomy, machine learning has also been used to detect rare phenomena, like gravitational waves events, supernovae, gravitationally lensed galaxies, incorrectly processed data, and much more. One analysis of outlier galaxies found many interesting phenomena (including many “galaxies” that weren’t galaxies at all).
3. Save time
Let’s be honest: some aspects of research are boring and time-consuming. In radio astronomy, vast computational resources and lots of time are required to remove artificial signals and corrupted data. Machine learning can perform these tasks using a fraction of the cost and time.
By speeding up the boring parts of research, machine learning can also enable new kinds of analyses that would otherwise not be possible. Many research problems try to address the following problem: given an observed outcome, what are the parameters for a model that produced such an outcome? These so-called inverse problems can be tackled efficiently using machine learning. For more details, read up on simulation-based inference.
4. Visualise and prioritise complex data
Datasets are growing bigger and bigger, but there are many ways to combine features into condensed versions. Dimensionality reduction methods include classical approaches like Principal Component Analysis (PCA), t-distributed Stochastic Neighbour Embedding (t-SNE), and Uniform Manifold Approximation and Projection (UMAP), or machine learning techniques such as using pre-trained neural networks or similar algorithms in order to transform the data into summarised versions.
It’s also useful to understand which inputs (or features) are most important for making predictions. Different machine learning algorithms reveal the most important features in different ways; for instance, random forests can automatically rank features by importance. For neural network models, saliency mapping enables you to pinpoint which pixels in an image are most essential for making a prediction (eg, Gradient-weighted Class Activation Mapping, or Grad-CAM). These algorithms provide some level of machine learning interpretability that can benefit your research program.
Remember that not every problem can be – or should be – tackled using machine learning methods. Machine learning simply provides a different set of tools that you can add to your toolkit. Hopefully, by combining these novel tools with domain-specific expertise, you’ll be able to discern which tools are best for the problems you’re trying to solve. Machine learning may be particularly useful when you have lots of data, and if your research benefits from finding trends or outliers, machine learning acceleration, or data visualisation or feature importance ranking. In the coming years, clever applications of machine learning can potentially transform the way that research is done.
John F. Wu is an assistant astronomer at Space Telescope Science Institute and an associate research scientist at Johns Hopkins University.
If you found this interesting and want advice and insight from academics and university staff delivered direct to your inbox each week, sign up for the THE Campus newsletter.