Answers to Exercises and questions for Discussion

Is there a danger that the procedures used to analyse a dataset become largely a function of the procedures that happen to be available on a particular computer package like SPSS?

To a degree this must be true so that, for example, the particular forms of bar chart that a researcher may use will be conditioned by the fact that SPSS offers simple, clustered and stacked bar charts. Having said that, most of the statistics offered by SPSS are pretty standard, so the amount of ‘conditioning’ in that sense is probably limited. I would also add that some procedures have been available on SPSS for a long while, but appear to be little used. I am thinking of all the statistics that are available on the SPSS Crosstabs|Statistics procedure like lambda, gamma, Cramer’s V, and so on. These are not commonly used and are often not even mentioned or explained in introductions to SPSS. In short, I think the amount of ‘driving’ done by SPSS is limited.

Do pie charts have any advantages over bar charts?

Pie charts certainly have more visual impact when the proportions of various segments are the key point of interest. Otherwise bar charts have most of the advantages. They focus more on the actual frequencies than on relative proportions. They also preserve the order of the categories, and the use of stacked and clustered bar charts means that other variables can be introduced.

You can get SPSS to produce any kind of nonsense. The trick is to know what counts as ‘nonsense’. Suggest some of the main ways in which the researcher might produce nonsensical tables and charts.

The most common way in which nonsense gets produced is to treat categorical variables as if they were interval. Try Descriptives on Gender. You obtain an ‘average’ sex of 1.53! Even treating a discrete metric variable as continuous metric can also produce results that do not make a lot of sense, for example the average number of channels on which adverts for alcohol have been seen is 5.87. Some researchers might use this figure, but not a single respondent can have watched 0.87 of a channel. The other commonly met way is to take a continuous metric variable and use it in a crosstabulation or using the Frequencies procedure without first putting the data into class intervals. This should not be too much of a problem for discrete variables provided there are not too many categories.

Get SPSS to produce a one-way table for each of the variables either in the alcohol marketing dataset (available at in the Trinians dataset (see Chapter 2, Exercise 6 for instructions). Look at the frequency distribution of each and think about which ones might require some data transformations.

In the alcohol marketing dataset, the most likely transformation needed is to convert some of the ‘Don’t know/not stated’ answers into missing values so that the remaining measure is properly ordered category. In addition, where some of the distributions are very uneven, it may be sensible to add together some of the categories, for example there are only 12 cases where the social class of the chief income earner in the household is social class A, so these could be added into social class B and the new category becomes social class AB.

In the Trinians dataset, Q11 asks respondents to pick out the three most important and the three least important things that the school should try to achieve for them from a list of nine items. This is not a rating scale, so do not try to get SPSS to add up the scores. Nor is it a fully ranked measure. If you run a Frequencies procedure on each item, you can pick out which ones have the highest number of ‘Most important’ evaluations and which ones have the highest number of ‘Least important’ evaluations. Similar considerations apply to Q14 and Q16. Q25 can be treated as a multiple response question (indicating the code 1 as the counted value). SPSS will then give you how many ‘Yeses’ each item has received. For Q27 it might be tempting to add up the codes allocated (remembering that ‘Often’ is given the lowest code), but you would be adding up very different forms of protest so that writing to a newspaper will have the same evaluation as assassination. These items really need to be treated separately. Q33 is a set of semantic differential items, so, again, do not try to add them up. The accompanying article from the Folio school magazine has picked out one of the items – left wing and right wing – and looked at the factors that appear to be associated with this perception of themselves.

Try out the Explore and Descriptives functions in SPSS on some of the variables.

These are very similar. Descriptives will give you minimum, maximum, mean and standard deviation for a list of metric variables. Explore is also only for metric variables, but will in addition give you confidence intervals, interquartile range, skewness and kurtosis for the selected variables. These can also be generated separately for other factors, for example by gender. Oddly, the resulting table is headed Descriptives.