Chapter Summary

The world is awash with data, which come in many different forms. There are quantitative data that are in numerical form and qualitative data that arise as words, phrases, narrative, text or images. Data are produced by researchers who may be academic or in governments, businesses or other organisations. Data may be micro data that relate to specific individuals (personal data) or other kinds of entity, or they may be aggregated into tables in which the identity of the original entities is lost (macro data). What characterises the data that are the focus in Kent (2015) is that they are quantitative, systematically recorded micro data such as arise from official forms, surveys, censuses or from the systematic observation or electronic recording of selected properties of many entities. The outputs from the social media, for example the 400 million tweets on Twitter or the billion videos on YouTube that are posted or viewed every day, are sometimes described as ‘data’, but these become the basis for creating research data only if researchers make systematic use of them.

This chapter argues that research data are a deliberate, thoughtful and systematic construction by researchers or other individuals and they result from a process of systematic record-keeping. Records are created in a social, economic and political context and for purposes specific to individuals, groups, teams or departments within organizations. Quantitative data arise as numbers that result from the systematic capture of classified, ordered, ranked, counted or calibrated characteristics of a specified set of cases. All quantitative data have a structure that consists of cases, properties and values. Cases are the entities under investigation in a particular piece of research. The number used in any particular analysis will be known and will relate either to the total set of cases of interest to the researcher (the ‘population’) or to some subset of them. In some research projects, there may be more than one set of cases, sometimes arranged hierarchically as sets within sets, or sets at different points of time. Properties are the characteristics of cases that the researcher has chosen to observe or measure and then record. They may be demographic, behavioural or cognitive and they may play one or more roles in a research project as descriptors, causes or effects. Values are what researchers actually record as a result of the process of assessing properties. Such records may relate either to variables or to set memberships. The values of variables assess cases relative to one another; sets define memberships or degrees of membership in absolute terms according to generally agreed external standards or based on a combination of theoretical knowledge and practical experience of cases.

The values of variables may be recorded into different types of measure that have in Kent (2015) been classified into binary, nominal, ordered category, ranked, discrete metric and continuous metric. Properties that relate to set memberships may be crisp or fuzzy. The distinction between types of measure is not always clear-cut and may be open to interpretation. The creation of measures, furthermore, is subject to many kinds of error in data construction. Errors can often be reduced by devoting extra resources to their minimization, but usually at extra cost in time and money.

The procedures used by researchers to structure and analyse a dataset are illustrated throughout Kent (2015) with a study carried out by the Institute of Social Marketing at the University of Stirling, which looks at the impact of alcohol marketing on the drinking behaviour of young people aged between 12 and 14. The findings are based on a survey that involves an interview-administered questionnaire measuring awareness and involvement with alcohol marketing and a self-completion questionnaire measuring alcohol drinking and associated behaviours. The homes of all second-year pupils attending schools in three local authority areas in the west of Scotland were contacted, generating a sample of 920 respondents. The key research hypotheses are that the more aware of and involved in alcohol marketing that young people are, the more likely they are to have consumed alcohol, and the more likely they are to think that they will drink alcohol in the next year.

Analysing Quantitative Data

Variable-based and Case-based Approaches to Non-experimental Datasets

Student Resources

Chapter Summary