Big data struggles to show its value for social sciences

Failure to improve predictions compounds concerns over effectiveness, accuracy and racial bias

June 9, 2020
Female doctor administering vaccine to small child
Source: iStock

Social scientists are seeing new red flags in their field’s predicted big-data future, finding computerised analyses not just vulnerable to bias but perhaps fundamentally limited in their predictive value.

Concern is rising after a large-scale study where 160 academic research teams, organised by Princeton University sociologists, tried machine-learning methods to predict the life pathways of disadvantaged children.

“The best predictions were not very accurate and were only slightly better” than those developed in traditional models using far fewer data inputs, the Princeton team reported in PNAS.

That result is a major warning sign for the quickly expanding ranks of computer-heavy approaches to the social sciences, said Filiz Garip, a professor of sociology at Cornell University who was not part of the Princeton study.

At Cornell, for instance, between a third and half of graduate students in the social sciences are already taking classes in machine learning, said Professor Garip, who assessed the Princeton experiment for a subsequent PNAS article.

“Everybody feels like they need to learn this, they need to gain these skills, to find any kind of job,” she said in an interview. Yet so far, as the Princeton study showed, “we’re not gaining a whole lot by using these methods”, she said.

The findings come as social scientists are already on the defensive over indications that using large databases and sophisticated computer programmes to guide political and legal decisions may be reinforcing and institutionalising human biases.

Long-recognised examples include predictive algorithms that identify black defendants as posing a greater risk of future crime because their community histories often show relatively high levels of police attention.

Advocates of such data-driven assessments have argued that problems within algorithms can eventually be identified and eliminated, thereby making them less biased than decisions that rely on humans alone.

The Princeton study, meanwhile, raises the question of whether the teaching of basic skills and perspectives in the social sciences may be getting pushed aside by an overriding desire to amass and analyse the vast troves of data that can be found on almost any human these days.

Such volumes of data may be adding more confusion than clarity by outstripping the capacity of social scientists to meaningfully understand what value each individual piece of data is really contributing to a necessary answer, Professor Garip said.

For the Princeton study, the participating research teams were given nearly 13,000 pieces of data on each of 4,200 families with a child who was born in a large US city around the year 2000, derived largely from visits, assessments and questionnaires over the following years with the child, parents, caregivers and teachers.

Given that information for those children up to age 9, the teams were asked to predict various outcomes for the child and family at age 15, including child school grades and parent job success.

The teams broadly failed to create computer-aided models that worked any better than traditional social sciences analyses that use far less subject data, in painting a picture of how societal conditions affect people’s lives, the Princeton team wrote.

The Princeton authors, led by sociology professors Matthew Salganik and Sara McLanahan, said they expect their social science colleagues will, in coming years, keep improving their methods of big data computer analysis.

Further experimentation, they said, should also help their field better understand what types of societal problems may justify scientists pursuing individual-level predictions, rather than being content with broader understandings of how policies affect people.

Professor Garip said she agreed with such perspectives. But in the meantime, she cautioned, large numbers of younger social scientists and their universities may be betting too heavily on data-intensive training.

“We have to be careful,” she said, “of jumping on this trend or hype.”

Register to continue

Why register?

  • Registration is free and only takes a moment
  • Once registered, you can read 3 articles a month
  • Sign up for our newsletter
Please Login or Register to read this article.

Related articles

Reader's comments (3)

Bit of a no-brainer really. How can machines ever analyse and understand emotive and empathic human beings better than humans do? How can machines understand the individuality and unpredictability of being human? Big data is for the dangerous Orwellian realm of ubiquitous and nicely packaged responses that fit an economic model, not the human being that is being human.
Algorithms that do not recognise nuance that a human being from the same cultural background would find unproblematic. Bizarre so much effort is been put into using technology which is at the intellectual level of a primary school colour chart. Why.
Congrats, this means the traditional social scientists were as good as machine learning algorithms in uncovering the principles governing those phenomena. I would hope so given that they had years to work on these theories. My reading is that both approaches were not terribly successful. One reason may be that grades and similar life events could in fact be relatively inherently random. Welcome to the big challenge of the social sciences - complex social systems are often more complex than physical systems, and this is further impacted by imprecise measures such as grades as a proxy for educational attainment. So of course neither the humans nor machines can do a lot better than what they did.