Answers to Exercises and questions for Discussion
This chapter checked the bivariate association between seeing ads for alcohol on more channels (and therefore having higher awareness) and stress on the importance of well-known brands for alcohol against gender, and found in a three-way crosstabulation that it was a little greater for males than for females. From the alcohol marketing dataset, using three-way crosstabulation on SPSS, check whether the same is true of the four hypotheses listed in Chapter 5 under implications of the chapter for the alcohol marketing dataset. Pick out one or two categorical variables other than gender that you think might affect these relationships and try a three-way crosstabulation with all four hypotheses.
The four hypotheses listed in Chapter 5 are:
H1 The more aware young people are of alcohol marketing, the more likely they are to have consumed alcohol.
H2 The more young people are involved in alcohol marketing, the more likely they are to have consumed alcohol.
H3 The more aware young people are of alcohol marketing, the more likely they are to drink alcohol in the next year.
H4 The more young people are involved in alcohol marketing, the more likely they are to drink alcohol in the next year.
These are bivariate hypotheses. However, if controlled for a third variable like gender, they become multivariate. In fact, for all four hypotheses, the bivariate association for females is slightly stronger than for males. This is the opposite of the relationship between awareness and brand importance for alcohol. Other variables that might be used as controls are whether or not their brothers, sisters or parents also drink alcohol and whether or not they smoke.
This chapter ran a multiple regression of total number of alcohol units last consumed against total importance of brands, total number of channels seen, total involvement and age at which the first alcoholic drink was taken. The resulting multiple R2 was very low. Using SPSS, check out the bivariate correlations between total units consumed and the other variables.
Figure 6.6 shows the SPSS results of a multiple regression of total units of alcohol last consumed against total importance of brands, total number of channels seen, total involvement and age at first alcoholic drink. The adjusted multiple R2 is very low at 0.095. The bivariate correlations between total units consumed and the other variables are:
Total importance of brands, r = 0.11
Total number of channels seen, r = 0.16
Total involvement, r = 0.29
Age at first alcoholic drink, r = 0.03
Only total involvement has any notable degree of correlation, although total number of channels seen would also be statistically significant if the cases were a random sample.
How can the status of any variable as ‘dependent’ or ‘independent’ be established?
For dependence techniques, it needs to be emphasized that it is the researcher who decides on the dependent or independent status of variables. The statistics themselves are blind to such allocations. Researchers will often conclude that one variable ‘accounts for’ a certain percentage of the variability on another variable, but the statistics themselves would also allow the ‘accounting for’ in the other direction or indeed that both share their variability. The dependent or independent status of the variables comes (or should come) from the research context of the researcher’s theoretical ideas, not from the statistics.
The appropriate use of regression-based techniques depends on a number of assumptions being met. Given that these are seldom met in their entirety, or not at all, to what extent has the use of regression been, in the words of Berk (2004: 203), a ‘disaster’?
For multiple regression to be legitimately performed, a number of conditions need to be met. First, there must be an adequate number of cases. Second, regression analysis assumes that the dependent variable is metric. Third, regression assumes that all metric variables are normally distributed. Fourth, multiple regression assumes linearity – that the data are best summarized with a straight line rather than a curved or oscillating one. The final assumption is that the independent variables are not themselves highly inter-correlated. If these assumptions are unexamined, then statistical analysis can easily become a misleading ritual. Readers of the results of regression analyses need to be warned if there are any issues with these assumptions. Quite apart from these statistical assumptions, there are often assumptions or decisions about the interpretation of the results. Thus it is often assumed that the variable selected as the ‘dependent’ variable in a regression equation is indeed an ‘outcome’ that is being studied; in reality, other interpretations are possible. Thus in Exercise 2 above it is being assumed that the total number of alcohol units last consumed is somehow a consequence of total importance of brands, total number of channels seen, total involvement and age at which the first alcoholic drink was taken. On another interpretation, however, the total number of alcohol units last consumed may be one of several independent factors that affect awareness of alcohol advertising, which is being taken as the dependent variable.
Apart from the selection of variables as dependent or independent, there is the issue of the meaningfulness of the final multiple R2. A result of R2 = 0.1 or even 0.2 may be interpreted as not important, not worthwhile or as a negative result. Researchers may be tempted, however, to say that the result is, nevertheless ‘statistically significant’ if the p-value is less than 0.05, which it is for R2 = 0.095 in Exercise 2. To argue that the relationship, therefore, ‘exists’ is overstating the case. The correlation is very, very small, even if it cannot be explained away as an outcome of random sampling variation.