Since the advent of so-called big data, much has been written about the possibilities and challenges of making the most of the multiple digital traces created online.
Even though research in this area is still emerging, enough has now been done across a wide range of disciplines to form the basis for this hugely ambitious book. Matthew Salganik has made it his mission to sketch out an emerging landscape by looking in turn at “observing behaviour”, “asking questions”, “running experiments”, “creating mass collaborations” and “ethics”. There is also a short final chapter on the future of social research.
Overall, the book relies on a repeated narrative device, imagining how a social scientist and a data scientist might approach the same research opportunity. Salganik suggests that where data scientists are glass-half-full people and see opportunities, social scientists are quicker to highlight problems (the glass-half-empty camp). He is also upfront about how he has chosen to write the book, adopting the more optimistic view of the data scientist, while holding on to the caution expressed by social scientists.
Salganik argues that data scientists most often work with “readymades”, social scientists with “custommades”, illustrating the point through art: data scientists are more like Marcel Duchamp, using existing objects to make art; meanwhile, social scientists operate in the custom-made style of Michelangelo, which offers a neat fit between research questions and data, but does not scale well. The book is thus a call to arms, to encourage more interdisciplinary research and for both sides to see the potential merits and drawbacks of each approach. It will be particularly welcome to researchers who have already started to think along similar lines, of which I suspect there are many.
To anchor the enormous amount of research included, Bit by Bit starts and ends by highlighting a specific piece of social research that combined detailed data on phone calls from about 1.5 million people in Rwanda with survey data in order to estimate the geographic distribution of wealth in the country. Their results were similar to the gold standard survey in developing countries, the Demographic and Health Survey, but their method was 10 times faster and 50 times cheaper. Salganik cites this study as a great example of what researchers have done well in the past (traditional surveys) with what we can do in the present (using new digital sources to gain similar insights). He foresees a world where research capabilities will continue to increase, yet to make the most of these we need to combine ideas from social science and data science, developing hybrid models of social research.
Each chapter’s main section draws on extensive historical examples to situate the current phase of social research and to map out continuities and differences in specific methods and approaches. For example, chapter 3 (“Asking questions”) covers the history of survey research, explaining how the telephone heralded a second phase of such research before digital opportunities opened up a third. Such context setting is highly valuable for students, and also for fellow researchers. Historical precedents are not always thought about carefully when it comes to understanding the challenges and opportunities of new research.
Certain important claims are perhaps slightly buried in places. The argument frequently repeated at the start of the uncritical big data era, that with enough data there would no longer be a need for theory, is brushed aside to stress that what has opened up instead is an opportunity for creating new theories.
Chapter 3, however, also highlights what is left out by focusing on a particular approach. “Asking questions” can form the basis for a range of different quantitative and qualitative approaches, many of which of course continue to be relevant in a digital setting. Although some, such as ethnographic work, are briefly mentioned, Salganik has clearly made choices about the approaches and methods he wishes to highlight. Given the structure of the book and individual chapters, this is understandable, but it would have been good to make more explicit that surveys are only one of many methods based on asking questions.
There are two chapters in particular that I suspect will be incredibly helpful, in different ways, to both social and data scientists. In chapter 2, on “Observing behaviour”, Salganik includes what I predict will be one of the most cited sections of this book: a thoughtful list of 10 common characteristics of big data, noting which are generally good for social research and which are generally bad. The kinds of data he suggests are helpful are “big”, “always-on” and “nonreactive”. The ones that are generally problematic for research, but also far more numerous, are: “incomplete”, “inaccessible”, “nonrepresentative”, “drifting” (population drift, usage drift and system drift can all make it hard to use big data sources to study long-term trends), “algorithmically confounded” (behaviour in big data systems is not neutral, it is driven by the engineering goals of the system), “dirty” and “sensitive”. I have no doubt that many researchers, including myself, will return time and time again to the roughly 25 pages of the book where this is set out. Salganik has provided a hugely useful and accessible checklist that will help researchers to better articulate the opportunities and trade-offs of the data they work with.
I imagine that chapter 6, which comprehensively deals with ethics, will prove particularly valuable for data scientists. Ethical issues are addressed throughout the book, but this chapter offers a clear overview, one that will be far more familiar to social scientists, especially those who have been working with digital data for some time. If Bit by Bit is read widely, as I am sure it will be (it has already been covered by Wired), then its greatest value lies in its ability to bridge these research communities and suggest productive routes to future ethical research endeavours.
Throughout the book, Salganik draws on historical and recent examples in order to offer detailed insights into key studies such as the social contagion experiment on Facebook. Here researchers worked with Facebook to manipulate the feeds of nearly 700,000 users and found that this made people feel more negative or positive, depending on the emotions expressed by others. As each chapter includes extensive sections on further reading (the most I have ever seen in a textbook) as well as incredibly well-thought-out activities (around 20 per chapter), this is a book to return to time and again. For those of us teaching in this area, much of it, and the activities in particular (all tested in a classroom setting in 2017), will no doubt find its way into our pedagogy. Clearly the result of years of research and teaching as well as a deep passion for carving out opportunities for social research in the digital age, Bit by Bit should be widely read by those engaging in social research, as well as beyond.
Farida Vis is professor of digital media at Manchester School of Art, Manchester Metropolitan University.
Bit by Bit: Social Research in the Digital Age
By Matthew J. Salganik
Princeton University Press, 448pp, £27.95
Published 13 December 2017
Matthew J. Salganik, professor of sociology at Princeton University, was born in Providence, Rhode Island, and grew up in and around Baltimore, Maryland. He studied for a BA at Emory University, an MA at Cornell University and a PhD at Columbia University, with an additional year as a visiting graduate student at the University of Groningen. He credits Emory as the place where he “first began to get curious about the world” and the other institutions as giving him “the broad and deep training that I needed to become a researcher”.
From early in his research career, Salganik has been interested in “new ways to create data”. His dissertation, for example, explored “social fads in a way that could never have been done in a traditional lab experiment on campus”. To accomplish this, he and his team “creat[ed] an artificial ‘music market’ in which 14,341 participants downloaded previously unknown songs either with or without knowledge of previous participants’ choices”. A current project, he says, uses “scientific mass collaboration…to improve the lives of disadvantaged children in the United States”. Bit by Bit builds on such experiences to “help social scientists and data scientists take advantage of the many new opportunities that have been created by the digital age”.
Asked about the methods and ethics of big data research, Salganik suggests that “when many people think about big data, they think about online data from places like Twitter and Facebook. But this is only part of the story. Going forward, more and more of our offline behaviour will be recorded and amenable to research because of the increasing prevalence of sensors in the built environment”.
Yet “the ubiquitous measurement and experimentation created by the digital age”, Salganik continues, “also raise important ethical questions”. His book should “help researchers think about both what we can do and what we should do”.