Statistics Variety

From Mathsreach

Jump to: navigation, search

“Statistics is good for people with short attention spans,” says Professor Thomas Lumley, deadpan. “In pure maths you might work on the same problem for 20 years. I admire that greatly but I would find it difficult. In statistics, what problems will be important tend to vary with the data.”

His recent work illustrates this variety; it includes the R statistical language; air pollution; interactions between genes and environments; flow cytometry data about cancer-related particles; gene locations for blood-clotting proteins, kidney disease and heart function; HIV testing; and DNA resequencing.

Lumley’s accent reflects his career moves – he grew up in Melbourne, studied in Oxford, and lived in S
eattle for 12 years, before moving to Auckland in October to be Professor of Biostatistics at the University of Auckland. His link with Auckland started in his student days in the mid-90s, when he first sent bug fixes to Ross Ihaka and Rob Gentleman, developers of R. He has since written a book on analysis of complex surveys using R. He says there’s a demand for biostatisticians in New Zealand, “but that’s true everywhere”.

He enjoys solving the problems that come up in large medical studies. “In genetics studies where you’re doing millions of statistical tests, you are working far out in the distribution curve - on the tails of the bell. If you’re doing two million tests, you have to produce false positives at a very low rate, one in two million.”

“You need to work with p values of .00000005 instead of .05, which is more usual. The difficulty is working out distribution theory for statistical tests with p values that small and several thousand people in the sample. And on the computational side, if a test can go wrong one in a million times, when you’re doing two million tests there is a 90 percent probability that it will go wrong!”

“Genetic technologies are getting cheaper and cheaper,” he says, “so we will be able to use genetic data in more and more research. It’s a very mathematical or computational area: DNA is a sequence of letters - closer to software than wetware.”

Lumley says there is a lot of hype about genetics: “Treatment customised to your unique genetic makeup is not going to happen any time soon”. Massive efforts like the Human Genome Project, which aims to identify the 25,000 or so genes in human DNA, and the HapMap project, which is cataloguing human genetic similarities and differences, have enabled researchers to find millions of genetic variants. Lumley was involved in one research project built on that information, which discovered a group of proteins that help regulate our heart rhythm. But the results from the big projects aren’t useful for predicting individual disease, he says.

One statistical research area in which New Zealand punches above its weight is two-phase study design. “If you have a large sample of people, and you want to measure expensive additional information for a subset, New Zealand researchers are world leaders in finding the best way to design and analyse studies that get the most information for the money available.”

Lumley is one of the contributors to StatsChat, a blog site commenting on stats in the news. Recent posts have discussed whether women take more sick leave, and the risks of anti-smoking drug Champix.

See also
StatsChat blog
UK blog about evidence-based medicine and statistics

Photo: Steve Barker, Barker Photography