The “big data” revolution has already transformed fields such as biology, astronomy and physics, but its impact has been much more patchy in the social sciences.
To explore why this might be, the University of Essex has teamed up with Sage Publishing to produce a Sage white paper titled Who Is Doing Computational Social Science?: Trends in Big Data Research.
In the natural sciences, the authors point out, big data research relies on “high-throughput instruments” such as particle accelerators and genome sequencers that have been “designed specifically for analysis by scientists”.
By contrast, while social scientists may deal with equally large amounts of data, these “derive overwhelmingly from mixed sources (e.g., social media, unstructured text, digital sensors, financial and administrative transactions) not designed to produce valid and reliable data for social scientific analysis”. This leads to “the challenge of harmonizing and extracting meaningful features from a variety of data streams”.
To learn more about how they are meeting this and other challenges, Sage Publishing carried out a survey of social scientists around the world and got 9,412 responses, mainly from the US (3,302 respondents) and the UK (728), followed by India (405) and Canada (353). Thirty-three per cent had already been involved in big data research, while of those who had not 49 per cent were “definitely planning on doing so in the future” or “might do so in the future”.
Along with “an appetite to engage with big data research”, the white paper identifies a number of barriers to entry, with “finding collaborators with the right skills” and the “time required to learn a new field” flagged up as the most significant. Some of those already using big data, meanwhile, said they had “a big problem” in “getting funding” (42 per cent) or “getting access to commercial or proprietary data” (32 per cent), while 61 per cent found “choosing an appropriate journal” a “big problem” or “something of a problem”. (One free-text response noted that “Several of the top journals in business school disciplines have not yet embraced Big Data Analytics.”)
Researchers’ own skills gaps were another significant problem, with 40 per cent of respondents wanting either “basic introductory training on big data analytics or data science” or a better understanding of “specific topics, such as text mining and R and Python programming [languages]”.
So how does this broad picture play out in the lives of individual researchers?
Laura Nelson, assistant professor of sociology and anthropology at Northeastern University, has worked on “analyz[ing] feminist movements in Chicago and New York City from 1865 to 1975”. After visiting archives across the country, collecting large quantities of text and then digitising them, she “started to incorporate techniques developed in computer science and computational linguistics to make the entire content analysis process more reliable, reproducible and scalable”.
Yet she encountered a number of obstacles. While she was at graduate school at the University of California, Berkeley, from 2006 to 2014, “there were few classes teaching the specific skills I needed. [Either classes] were taught in computer science and applied math departments and were largely impenetrable to me…or they focused on skills that were marginal to what I really needed”. A “Python bootcamp” at another university “would regularly have 10 to 15 sociologists, and every year there was a 100 per cent drop-out rate for these sociologists in the first few days of the 5-day workshop”.
If we really want to see social scientists embracing big data more effectively, Professor Nelson said, such courses should be “taught or co-taught by social scientists and humanists”, and some basic knowledge of the field must be “incorporated into the standard methods courses in every graduate programme”. To overcome issues of computing power, she would also like to see every graduate student getting “access to a server capable of scaling up to do big data research from the moment they start their programme”, in order to “move away from the model of everyone working on their own machines”.
Eric Meyer, professor of social informatics at the University of Oxford, said he believed that big data research in the social sciences began “picking up a massive amount of steam around 2011”. Yet the kind of interdisciplinary collaborations required can prove tricky: “Unless all participants get something out of the relationship, there is a risk that one or more will be viewed simply as a service to the other.
"This can go both ways: the technical experts might be seen just to be doing the computing work for the domain expert, but equally the domain expert might just be seen as a handy source of real-world data against which to test some new computational methods. The best projects are able to identify challenges that excite all the participants in different ways.” It is also essential, therefore, to train young researchers in the crucial “bridging skills”, he said.
Giuseppe Veltri, associate professor in social psychology of communication at the University of Leicester, noted that “most universities have developed research infrastructures around the natural sciences” and have only introduced “high performance computers clusters” for that purpose. This means that “IT services are used to satisfy the requests from natural scientists and are very much unprepared for those from the social sciences”.
James Allen-Robertson, lecturer in media and communication at the University of Essex, would like to see “more pre-built code directed at social scientists”.
“There’s a whole world of tools and packages that can do some amazing things,” he explained, “but unless you’re aware they’re there, it’s difficult to know what to look for. Really that knowledge can only come from immersing yourself in the communities producing these things, knowing what is being developed, what packages have been superseded or outmoded, what methods are more efficient; it’s a whole discipline in itself.”
Luke Sloan, deputy director of the Social Data Science Lab at Cardiff University, said now was “an exciting time for people interested in new ways of doing research”. His own work uses Twitter data to try to predict the results of elections. This raises familiar questions about “sampling, representation and authenticity”, which he suggested have been “largely solved for other research methods such as surveys but haven’t been worked out yet for Twitter”. Such challenges are also stimulating, however, since “being exposed to new data makes us rethink the questions we can ask”.
Using big data in their research, concludes Who Is Doing Computational Social Science?, offers “huge potential for social scientists” but also requires “new skills, new collaborations, new research methods and new computational tools”. Although researchers still face many obstacles, they are actively seeking ways of overcoming them.