10.1 Visualizing grouped data

The first interactive graph shows the relationship between global temperature change and CO2 emission over time. The graph is a smoothed scatter plot at three periods: between 1880 and 1929; 1930-1979; 1980-2010. We can see that the relationship between CO2 emission and global temperature is dynamic. Early records (the first period), where the emission was relatively low have seen a slight negative association between the two variables, with quite a bit of unexplained variation around the regression line. In the second period, the relationship turned marginally positive, though again with a lot of unexplained variation. Lastly, in the most recent period the effect has become visibly positive, sizeable with much less unexplained variance. It coincided with a significant increase in CO2 emission.

The second graph shows over time tendencies in the incidence of crime in London across different boroughs. The think black line is the average tendency when all boroughs are taken into account. The graph shows that though overall the trend line is pretty flat, suggesting the number of registered crimes is relatively stable, there are considerable differences between the boroughs, with some boroughs registering considerably more crime than others. Furthermore, in some boroughs we can detect a slight positive trend over time. The graph thus allows us to see important variation in crime levels, but it makes it almost impossible to assess what is going on in individual counties.

Neither of the graphs can answer the question why the observed patterns and tendencies take place and whether they are representative of the population. However important, visualization techniques are only the first step in data analysis.

10.2 Research hypotheses

The exercise asks students to look at two graphs introduced earlier in the chapter and formulate research hypotheses as well as null and alternative hypotheses.

1.   Global temperature change – CO2

A brief look at the scatter plot below suggests a strong positive relationship between CO2 emissions and global temperature change. With this in mind, our research hypothesis can be formulated as follows: Increases in carbon dioxide emissions cause an accelerating rise of the global temperature. The specific wording can vary, but the two points should feature prominently: first, the assumption of causality between the two variables, which is one of the fundamental differences between a research hypothesis and statistical hypothesis testing; second, the notion of an accelerating rise is important because it reflects that we are looking at the temperature change rather than its absolute value.

The null hypothesis then will be as follows: there is no association between CO2 emissions and the global temperature change.

The alternative hypothesis: the true association between CO2 emissions and the global temperature change is not equal to 0. We can be more specific and hypothesize that the true association is above 0, which would assume a one-tailed statistical hypothesis. However, we do not go in such detail in the book.

2.   London crime data

The violin plot below shows that crime incidence varies across the boroughs of London. We can thus say that our main research hypothesis will test the assumption that crime incidence is unequally distributed among local authorities.

The null hypothesis: the average number of registered crimes is equal for all local authorities.

The alternative hypothesis: the average number of crimes is not the same for all local authorities (i.e. at least one authority is different from the rest of the data).

It is worth noting that while statistical hypothesis testing in this case is about comparing the means, the research hypothesis is more complicated than that. Underlying the research hypothesis are causal mechanisms that explain the variation of crime incidence (poverty level within local authorities, police density, etc.). We should emphasize that such assumptions fall outside the remit of the null and alternative hypotheses.