Like many social scientists, I spend my working life collecting, coding and analysing data. If I were asked to play word-association games,"data" would therefore tend to elicit "deadline", "missing", "delay", "mis-code" and other life-enhancing terms, these being the reality of most days on the job.
In the past few weeks, however, I have been thinking more generally about data, especially those on human beings. This is partly a response to re-reading a book and partly to the recent death of a colleague, Neville Butler. He was the prime mover in the creation of the 1958 and 1970 "cohort studies", internationally renowned longitudinal studies that track all the individuals born in Britain in a given week of 1958 and 1970 respectively. The participants have been followed up at intervals, and a wealth of information has been collected on their school attainments, birth families, work, earnings, health, families and children and participation in society. The data are used by researchers to shed light on a host of social developments and the dynamics of individual lives.
Obituaries of Neville have, rightly, emphasised his vision in seeing the need to relate medical, social and economic variables at individual level, and the extraordinary determination and fundraising flair that kept these studies alive. However, two other points are worth making.
First, there was nothing random about the information collected. Neville and his colleagues had some very clear interests, objectives and hypotheses; he was an eminent paediatrician with a specific interest in perinatal medicine. The questions asked in the early surveys reflected both established findings and new hypotheses about which aspects of family and home environment might affect foetal and child development, and about the impact of newborn babies' health and care on their later physical and cognitive growth.
Of course, some questions were included on a more speculative basis than others and, looking back, there are absent variables one would dearly like to have had included. But, overall, the studies bear out a general rule.
Well-designed data collection exercises have to be based on some underlying theories and models. The amount of information out there is, for practical purposes, all but infinite. If you collect it just because it might be useful one day or because someone important might want to know about it, you will end up wasting money on data that are largely unused and largely unusable.
A second crucial point about cohort studies is that participants are all volunteers giving time because they recognise the value of the study. This helps to ensure high-quality data. But while a large proportion of the original respondents remain, there has been attrition. Not everyone is willing to share the details of their lives, even with the most respected and reliable of research teams.
This could well be a growing problem. At the Office of National Statistics they will tell you that, back in the 1950s, people who collected information for national surveys could count on close to 100 per cent co-operation when they knocked at a door. Not any more. It is not only individuals who are refusing to help, either. In a recent multinational survey of pupil attainment, England had to be dropped from the final report because not enough schools had participated.
Which brings me to the book I have just re-read. It is written by David Hand of Imperial College London and is called Information Generation: How Data Rule our World . I should declare an interest, since I read and endorsed it in manuscript. It is, in large part, a paean of praise to data and measurement, providing an excellent discussion of their place in the development of modern science. It also has a chapter on "Big Brother's eyes" that helps explain our growing unease about data collection, which goes well beyond resenting the time spent on yet another questionnaire.J We know about closed-circuit TV surveillance, but how about radio frequency identification, used to track objects and carry out stock control at a distance? Buy one of those objects and you too can be tracked by its signal and have the item linked to your credit card record. Use such card, and more than 70 pieces of information about you go into the system as well.
As a researcher, I love individual-level data, the more detailed the better. As a citizen, I feel less and less inclined to sign up to huge integrated databases that give government agencies access to all my records. Will the Big Brother age make people less willing to share information with researchers? Or will they conclude that with no privacy left to lose, they might as well share everything? I truly have no idea.
Alison Wolf is Sir Roy Griffiths professor of public sector management at King's College London.