A decade ago, technology marketing materials and faculty meetings alike were set alight with excitement about the potential of big data. Digitalisation and more powerful processors would enable new insights from overlooked archives, as unprecedented scale enabled more robust conclusions.
“Big data did not necessarily mean you have big data, but this awareness that there’s lot of data out there, and we finally need to do something with that,” Harald Binder, head of the Institute for Medical Biometry and Statistics at the University of Freiberg, told Times Higher Education, reminiscing about the giddy times.
Investments from central and local governments followed, but recent years have been somewhat anticlimactic, in part because the largest datasets are the walled gardens of major US technology companies, Professor Binder said.
In Germany, political interest and public capital are beginning to shift towards the tidy new label of “small data” – datasets that are too small in themselves to reach statistical significance and must therefore be combined with others.
Professor Binder and colleagues recently won funding from the German Research Foundation to set up a collaborative research centre for small data that will bring together computer science, mathematics and statistics. A major project will be to pool techniques already developed and named within different disciplines but which should be applicable to other problems. “One of the challenges is that each subdiscipline came up with their own solutions,” he said.
Germany, with its “specific and very strict interpretation” of the European Union’s personal data protection rules, is a good place to develop such methods, Professor Binder added. Politicians may also see the benefit, because its struggling economy depends on small and medium-sized companies, which each hold limited data and need to combine it to compete with larger foreign rivals.
Applications of the techniques the centre is working on range from rare diseases and the very specific outcomes of surgeries and other treatments, through to scant diagnostic or forensic data, such as that used when building up the DNA profile of a potential criminal suspect.
“Statistics has always been on small data, because historically we didn’t have large datasets, but what has changed is we are now in a setting where we are able to connect many small datasets,” Professor Binder said.
Find out more about THE DataPoints
THE DataPoints is designed with the forward-looking and growth-minded institution in view