John MacInnes’ guide to online stuff

Online resources come and go so that any guide to them is out of data as soon as it is compiled. There is a good guide to several sources on the quantitativemethods.ac.uk site.  I have a poor memory, and rote learning formulae is not my speciality. The web is very useful to quickly locate them. Search terms like ‘Chi-square’ or ‘standard deviation’ and you’ll find dozens of pages with not only the formulae, but often online calculators for them. Be a little cautious of online resources. While there is some wonderful material to be discovered, there is also a lot of dubious stuff out there, and especially as a beginner, it is not always easy to tell the difference. I find that I use the following sites a great deal.

The UK ESRC’s Quantitative Methods Initiative.

Understanding Uncertainty: David Speigelhalter’s site on probability.

UK Data Service: As well as the data archive this site has a number of excellent learning resources.

Full Fact: Much more than just a ‘fact checking site’, browsing its pages will give you a good insight into the uses and abuses of data.

Khan Academy: If you ever need to brush up on anything from addition to calculus, this is where to do it.

Gapminder: Even if you are an ‘expert’, Hans Rosling’s site is full of inspiration when it comes to putting across complex data stories in simple language and graphics.

Some good places to start if you want to explore further

There are any number of books and online resources you can use to build your data analysis skills. They tend to divide into three groups. First, and in my view the most valuable yet most difficult to find, are those that deal with quantitative reasoning or what I think of as a ‘statistical imagination’: a capacity to see and interpret the world using patterns and numbers. The best of these by far is Dilnot and Blastland’s Tiger That Isn’t. It is short, easy to read and entertaining, but contains a wealth of valuable insights that even the best statisticians can sometimes forget. In a sane world every decision maker and anyone who works with data should be required to read it: often.

I also like Jordan Ellenberg’s How not to be Wrong. Daniel Kahnman’s Thinking Fast and Slow is a great guide to how our brains aren’t wired for statistical analysis, so that our intuitions can often lead us astray. It’s also useful to balance Kahneman’s ideas with those of Gerd Gigerenzer. Among his most accessible books are Reckoning with Risk, Risk Savvy and Calculated Risks. David Spiegelhalter is the grandly titled Winton Professor of the Public Understanding of Risk at the University of Cambridge. He is one of the best writers on probability. Both The Norm Chronicles and Sex By Numbers are quirky, disarming and funny, but behind the jokes, very rigorous. A classic, and written in the style of its day, but with insights that remain relevant sixty years after its initial publication is Darrell Huff’s How to Lie with Statistics. Finally, a more technical presentation of a similar range of ideas is An Introduction to Probability and Inductive Logic, by Ian Hacking.

The second group are books that are guides to using SPSS or other statistical analysis software. My personal favourite is Exploring Data by Cathie Marsh and Jane Elliott, but also good is  Discovering Statistics Using IBM SPSS Statistics by Andy Field, and Understanding Social Statistics by Jane Fielding and Nigel Gilbert.

Finally there are books on statistics. There are hundreds of these, but in my view, few rise above a dry presentation of statistical techniques and formulae without enough attention to context or discussion of why and when to use the techniques they describe. Among those that manage to do the latter, make sense of statistics and properly discuss the relevance, advantages and limitations of what statistical analysis can achieve, I’d highlight Peter Diggle and A.G. Chetwynd  Statistics and Scientific Method: An Introduction for Students and Researchers; David FreedmanRobert Pisani and Roger Purves Statistics; and Alan Agresti Statistics: The Art and Science of Learning From Data.