Answers to Exercises and questions for Discussion

Are all data ‘manufactured’ in some way or are there some data that we can accept as ‘given’?

If we accept that all data arise from systematic record-keeping, then they are all constructed by somebody at some point in time within a social, economic, political and moral matrix of possibilities and constraints. The records created are not reality itself; rather they are a result of researchers’ or other individuals’ attempts to observe or measure traces or evidence of phenomena situated within complex systems. The extent to which the records are ‘manufactured’, however, is a matter of degree. Much demographic data – age, sex, educational background, area of residence, and so on – we would probably accept as ‘given’ or factual, since there is a ‘true’ value that exists independently of the researchers’ attempts to measure it. However, some demographics like social class are, by definition, researcher creations. Attempts to measure cognitive properties of individuals as cases are always going to require a manufacturing process.

If a social researcher wanted to measure the extent to which individuals are ‘religious’, suggest how this could be achieved in a way that is (a) direct, (b) indirect, (c) derived or (d) multidimensional.

(a)    A direct measure of religiousness (or religiosity) would involve asking respondents to put themselves on a scale of degrees of religiousness, for example ‘How religious would you say you are?’:

  • Very religious
  • Fairly religious
  • Not religious

(b)   An indirect measure would mean taking an indicator like asking people when they last went to church or when they last said a prayer.

(c)    Religiousness almost certainly has several dimensions like church attendance, frequency of praying, belief in life after death, beliefs about God, and so on. Each could be a separate question with a five-point scale for responses like Never, Seldom, 1–3 Times a Month, Weekly, Daily. These could be given a score of 1–5 and scores totalled for each item.

(d)   It could be argued that church attendance, praying and various beliefs about God are such disparate dimensions that adding them together is meaningless. An alternative would be to treat each as a separate dimension so that in explaining, for example, the various social factors that affect people’s religion, church attendance, praying and beliefs could be crosstabulated separately against these other factors. An alternative would be to profile each respondent on each separate aspect.

Make a list of variables that (a) are naturally binary, (b) can sensibly be made binary and (c) would be unwise to convert into binary.


We tend to think in binary terms, but often the distinction is not very clear-cut as between a ‘democratic’ and a ‘non-democratic’ state or organization. A true binary variable is a record of the presence or absence of a property, so keeping a hospital appointment and not keeping it would be a natural binary variable. Some binary variables that are clear-cut may in fact be administrative creations, for example living at a distance from school that entitles a free bus pass or does not entitle.


Nominal variables with few categories can often sensibly be made binary by taking one of the categories as the property possessed and the others as not possessing that property. Thus an assortment of different types of housing may be classified into ‘local authority rented’ and ‘not local authority rented’.


To convert any metric variable into a binary one will entail a decision about a value that is a cut-off point between possessing and not possessing a property like ‘high income’. Simply taking the average will often not be sensible. Choosing other values might be quite arbitrary and different values are likely to affect crucially the outcome of many data analysis procedures. It will usually be wiser to keep the variable as metric (or convert into a fuzzy set) and to choose a method of data analysis that takes into account the distribution of different values.

What type of measure would you use for each of the following?

(i)     Degree of satisfaction or dissatisfaction with the services offered by the local social services department.

(ii)    Attitudes towards the BBC’s Radio 2.

(iii)   The degree of local support for the creation of a ‘free’ school in an area of urban deprivation.

(i)     This would need to begin by asking respondents which local social services, if any, they had used within a defined period of time, for example in the last year. Satisfaction can be measured directly by asking for an overall evaluation on a five- or seven-point rating scale. The resulting variables would, strictly speaking, be ordered category, although many researchers would treat them as if they were metric and will calculate averages, standard deviations or use the results in factor analysis. Satisfaction may be thought of as multidimensional, so several indicators may need to be combined, for example satisfaction with the speed of response, helpfulness of the staff, the outcome and the follow-up. The codes allocated, provided the highest code is given to the highest level of satisfaction, might be summed to give a summated rating scale. Alternatively, they may be seen as separate dimensions that cannot be summed, in which case some kind of profile will need to be given.

(ii)    The problem with ‘attitudes’ is that it is a blanket term used usually to mean any form of positive or negative evaluation of some phenomenon, situation or person. We need to specify attitudes to what exactly, for example, in this case to the quality of the programming as a whole, to particular programmes or time segments, or to the quality of the sound. Attitude measurement is usually based on derived techniques, particularly summated rating scales, for example Likert scales.

(iii)   This is a complex issue since it would involve not only attitudes of support or hostility, but also what actions respondents might be prepared to take, for example, leafleting or canvassing in support of various forms of protest, writing to newspapers, going on demonstrations, organizing or attending protest meetings. These behavioural properties could be used to construct some kind of index of hostility which could then be compared across time to measure trends or across different types of local resident.

Examine Table 1.1 and consider which variables are demographic, which ones are behavioural and which ones are cognitive. Also consider which ones have been measured directly, which ones indirectly and which ones are derived.

Demographic properties relate to features that researchers have chosen to characterize the nature or condition of a case. They are not behaviours or cognitive. In Table 1.1 the last three variables, namely gender, social class and religion, may be seen as demographics, so might age at which respondents had their first proper alcoholic drink. Behavioural properties relate to what cases did in the recent past, to what they usually or currently do, or to what they might do in the future. The first five questions, for example ‘Watched television in the last 7 days’, clearly fall within this category, so do the items relating to the channels on which adverts for alcohol were seen, involvement in alcohol marketing, for example ‘Received free samples of alcohol products’, drink status (‘Have you ever had a proper alcoholic drink?’), likelihood of drinking alcohol in the next year, how often they drink alcohol, total units last consumed, whether sibling or parents drink alcohol and smoking behaviour. Cognitive properties relate to mental processes that go on within individuals and include their attitudes, opinions, beliefs and images. In Table 1.1 this will include brand importance, liking or disliking of alcohol adverts or school.

Most of the variables are measured directly, but total importance of brands, total number of channels seen, and total involvement are derived. Social class is measured indirectly taking occupation of the household chief income earner as an indicator of social class.

Explain the type of measure indicated in Table 1.1 for each of the variables in the alcohol marketing study

Binary variables consist of a record of the presence or absence of a property. The items that relate to whether or not respondents have seen the promotion of alcohol in a number of channels are indicated in Table 1.1 as binary because either they say ‘Yes’ or some other answer is given. In the dataset, these have been coded as 1 for ‘Yes’ and 0 for ‘No’ and ‘Don’t know’. The same is true for involvement in marketing of alcohol, whether or not they have ever had a proper alcoholic drink and whether or not siblings or parents drink alcohol. Gender, strictly speaking, is not binary, and in the dataset has been coded as 1 = male and 2 = female rather than 1 and 0. However, where there really are only two categories it may be treated as if binary and is indicated as binary in Table 1.1.

Nominal variables consist of contrasting groups. Usually there are three or more categories, as in the first five items in Table 1.1, where ‘Don’t know’ is treated as a separate category. This could, of course, be rapidly made into a binary variable by recoding ‘DK’ into ‘No’. What it makes sense to do is up to the researcher. If there are many ‘DK’ answers then it may be better to treat this as a separate category, or the researcher may, for theoretical reasons, be particularly interested in analysing ‘DK’ answers. The only other nominal variable in Table 1.1 is religion, which has four categories.

Ordered category variables consist of two or more categories that are arranged in relationships of greater than or less than, although there is no metric that will indicate by how much. Brand importance in Table 1.1 is a good example, provided ‘DK’ is left out. This has been coded as 6 in the dataset. It is crucial, if the nine items are to be totalled, that ‘DK’ is treated as a missing value, otherwise a ‘DK’ answer has a higher score than ‘Very important’! The degrees of liking or disliking alcohol ads or liking or disliking school are single-item questions so will not be totalled. However, beware that, for alcohol ads, liking a lot has been given a low code in the dataset while liking school a lot has been given a high code. There is no rationale for this, but researchers are not always consistent, so data analysts need to be careful. Note that ‘Neither’ is a middle category that is being treated as different from ‘DK’. For intention to drink alcohol, ‘Not sure’ has not, in the dataset, been coded in the middle between ‘Probably not’ and ‘Probably yes’. This is not a problem unless an ordinal statistic like gamma is used, in which case the ordering is important. ‘Not sure’ can be either recoded as the middle category or left out as a missing value. The process of recoding is explained in Chapter 2.

How often respondents have an alcoholic drink and how often they smoke are best considered as ordered category, although if a calculation about number of times or number of cigarettes over a period of time could be calculated, then it would be discrete metric. Again DK and not stated answers would need to be excluded for it to be considered ordered category. The same applies to social class.

Total importance of brands is discrete metric because it is treating codes as numeric. These are whole numbers or integers, so is not continuous. The total number of channels seen and total involvement are more clearly only discrete metric. Only age at which respondents had their first proper alcoholic drink and total number of units last consumed are continuous metric. Even though age, for example, is usually reported as rounded down to age last birthday, age itself is still a continuous variable and could, potentially, take any value. Similarly, fractions of units of alcohol are possible.