As a researcher, I attend a lot of meetings about data. Also present, typically, are institutional representatives charged with informing researchers about the necessity of “protecting” the privacy and rights of people whose data are collected. This is often used as a reason for not collecting the data in the first place or for not allowing access to them.
In no way do I suggest that we must loosen our ethical responsibilities to our respondents. But I do object to Canada’s hyper-bureaucratic emphasis on the evidence-free assumption that all researchers are evil schemers with no regard for privacy beyond that which is imposed on them by oversight committees.
Let me offer some context. After completing my master’s degree in Canada, I moved to the UK to undertake doctoral studies at the University of Essex’s Institute for Social and Economic Research. The institute collected the British Household Panel Study from 1991 to 2009, a longitudinal study of more than 5,500 households (and more than 10,000 individuals). These data were on our shared drive. I could analyse them at my leisure (and as part of my job). At no point was my integrity ever questioned.
From Canada, I still have access to the UK Data Archive (also physically housed on the Essex campus). By filling out some online forms, I can download and analyse many rich longitudinal datasets. But when I returned to Canada in 2004, the limitations on accessing domestic data began to reveal themselves.
When I began searching for longitudinal datasets comparable to those that I had used in the UK, I learned that they were collected almost exclusively by Statistics Canada, a government agency. To access them, I needed to apply to one of the Statistics Canada Research Data Centres set up at numerous post-secondary institutions in the 1990s under a federal programme, the Data Liberation Initiative, that aimed at improving access to federal data resources. After my application was approved and I had sworn an oath to the Queen (really!), I would be able to access data – but only within the centre between the hours of nine and five. And strictly no USB devices.
The UK’s Office for National Statistics is not noticeably less bureaucratic. Only UK-based “approved researchers” may use its data, and they must complete an application process designed to ensure that confidentiality is not compromised. But the difference is that although the longitudinal studies that I use from the UK are publicly funded, they are not collected by the ONS.
Of course, respondent confidentiality is important, but it is a red herring. I have worked with scores of datasets without ever feeling compelled to try to identify individuals. So have thousands of other researchers across the globe. It isn’t what we are interested in. Nor is it usually even possible if identifying features such as names and addresses are removed. Yes, if there was one millionaire surveyed in a small town, that person would be identifiable in raw data that revealed reported income and town name. But that problem could be solved by swapping precise incomes for salary ranges, and town names for population size categories.
So I don’t dispute that some data cleaning is required to preserve the confidentiality of study respondents. However, Canada’s entire data infrastructure has been created around this perceived threat of outing respondents. A giant bureaucracy has blossomed because some imagined statistical Bond villain might have the freakish means and inclination to dismantle someone’s life or steal someone’s identity via a dataset.
The only way for non-government researchers to get around the red tape is to collect large-scale data by themselves. This is difficult to do without the resources that governments can command, but it is a task that I am involved in – not least because I have no patience for the alternative.
Karen Robson is an associate professor and Ontario research chair in educational achievement and at-risk youth in the department of sociology at McMaster University.