In the era of big data and machine learning, everybody is trying to extract the best quality information from their data. It’s a complicated task, but one statisticians have been working on for more than a century, long before anyone could even conceive of Facebook or Google. This is the field in which Professor Nancy Reid OC, FRS of the University of Toronto has made her career.
Professor Reid started off her studies in Computer Science, but soon found herself attracted to Statistics instead.
At the time, programming was done on punch cards. So you would write a program and stand in queue with your box of punch cards. When you got to the front you would hand your box over to the computer operator, who would run it. If there were any errors in your program you would get your box back, find the error, fix the code, and then go back to the end of the queue. I did not have the patience for that.
As a statistician, Professor Reid is concerned with the analysis of data, and teasing out cause and effect in complicated situations. Whenever you read a news story about the latest diet fad, cancer-causing agent, or anti-ageing exercise which seems too good to be true, you’re seeing the end product of a lot of statistical analysis. While it is possible to do this sort of analysis well it is often quite difficult, and the standards of many of these studies aren’t quite up to snuff.
Professor Reid makes a point of dissecting these articles in her teaching, and unearthing the underlying statistical reasoning. Doing this helps her convince students of the need for careful thinking, by helping them see through the hype to what the studies actually found.
I would say that over my career the situation has improved quite a lot. Journalists are better trained now in statistical science than they were, and realise they need some consultation in the things they don’t understand. The things I see in respectable newspapers are much better than they were twenty or thirty years ago. Groups like @callin_bull or @justsayinmice on Twitter are also doing this sort of work with a much wider societal impact.
During her Australian visit, Professor Reid is visiting a number of collaborators across many institutions in Sydney, Melbourne, and Canberra. Among her projects is an exploration of the intersection between statistical inference – knowing how to carefully draw conclusions about your data – and data visualisation. With the larger and larger data sets permeating our world, finding effective visualisations for data is increasingly important. However, the question of how to draw conclusions from these visualisations remains tricky. Traditional methods for drawing inferences rely on numerical summaries, and are not easily applied to a graph or a plot.
We’re trying to create a blend of inference where the basic ingredient is not a set of numbers, but a picture. That’s a lot of fun.
When looking for research projects, much of Professor Reid’s motivation comes from the world of the practical. However, instead of looking for ad-hoc solutions to specific problems, she tries to find the underlying issue that drives a larger set of problems.
For example, she’s currently studying an issue that originally arose in genomics, where you have huge amounts of data measuring thousands of gene expression levels, but may only have a small sample of people. The small number of independent observations makes it quite difficult to draw conclusions with any confidence. Situations like this, where you have large amounts of data on relatively few subjects, are called ‘big p, small n’ problems, and come up in many different contexts, including genomics. Professor Reid’s expertise in asymptotic theory, which concerns the properties of statistical analysis in very large data sets, offers a unique perspective on this problem of inference.
Despite the difficulties involved in doing statistical analysis well, Professor Reid is optimistic about the future of the discipline, and is excited to see what a younger generation of statisticians will produce.
It’s an exciting time for the statistical sciences. There’s just so much interest in things that we learn from data. I’ve not seen this much enthusiasm for statistical ideas so widely spread in my whole career. I think it’s a wonderful time for the discipline.
I am delighted to have the opportunity to come to SMRI. It’s a wonderful initiative, and I’ve always wanted to come to Australia, for as long as I can remember. It probably wouldn’t have happened if it weren’t for SMRI’s help.