Exercises and Discussion Questions

8.1       Imagine that you are on a team of students that wishes to conduct a survey of student opinions regarding various issues on your campus. The team’s plan is to conduct intercept surveys with at least 50 students. How would you obtain this sample? What location(s) would you use? When would you do the research? How would you select respondents?

There is no set answer to this question; it will depend on your campus. However, you should consider the following issues.

The first issue is where to do the intercepts so as to minimize possible coverage bias from omitting some types of students. Is there a location or set of locations that everyone passes? For example, does everyone pass by the student center on any given day? Are there intercept points between the residential halls and classroom buildings that will cover almost everyone who lives in the dorms? Are there intercept points between parking lots and classroom buildings that will cover almost everyone who commutes to campus? Even if we can identify such intercept points, what about students in fraternities, sororities, or off-campus housing who walk to campus? Are there intercept points that will cover students with various majors such as liberal arts, social sciences, business, and engineering?

The next issue is when to do the intercepts so as to minimize possible coverage or non-response bias. Do we need evening interviews to capture evening students who may be part-time, older, commuters with issues that differ from full-time day students? Will we get higher cooperation if we intercept students after their classes rather than when they are hurrying to class?

The next issue is how to select students so as to minimize possible selection bias. Should we apply systematic sampling after a random start at any given location, with one person to count and select passing students and one or more interviewers to approach and interview selected students? Will we do just as well if we dispense with the counter and have each interviewer approach the first passing student, then the next passing student after each interviewing attempt, on the logic that there is no reason for the next passing student to have any particular characteristics? If so, should we have an observer to make sure that interviewers do indeed take the next passing student, and don’t start choosing people they feel more comfortable with?  Should we set quotas for particular types of students to ensure that they are represented?

The specific answers to these questions will depend on the nature of your campus and your interviewing resources; e.g., if you only have one person available at any given time. Even if you only have one person available, you may prefer a fixed sampling procedure because you think it is more likely to keep interviewers from starting to choose people they feel more comfortable with, or simply because it will sound more credible when you write the research report.

The general principle, as always, is to anticipate possible coverage bias, selection bias, and non-response bias, and use procedures that will minimize the sample’s exposure to bias.

8.2       A public health researcher wishes to conduct a U.S. national telephone survey of households that are: a) headed by a woman living without an adult partner, b) with at least one child present under 14 years of age, and c) a household income under $25,000 per annum. How would you design the sample for this survey? Would you expect any of the methods described in Section 8.3 to be useful?

The target population for this research qualifies as a rare population. Methods that one might consider to reduce the cost of reaching a rare population include telephone cluster sampling, disproportionate stratified sampling, network sampling, dual frame sampling, location sampling, and using an online panel. Exhibit 8.1 summarizes the conditions under which each method may be useful.

Telephone cluster sampling works if there are a substantial number of telephone exchanges without any members of the target population. That is not likely to be the case here; there are not likely to be a lot of exchanges that do not have any low-income single mothers.

Disproportionate stratified sampling works if the prevalence of the target population varies across geographic areas, so we can oversample the areas with higher prevalence. Since income varies across geographic areas, this method may be useful for this population.

Network sampling works if the target population can be identified and reached through members of a well-defined network such as immediate family. In this case, people should know gender, household status, children’s ages, and approximate income for members of their immediate family, so network sampling might be useful if people are willing to refer the researchers to their family members. Other social networks such as “next door neighbors” also may work.

Dual frame sampling works if you have a special frame with high prevalence of the target population. In this case, if the researcher has access to records such as lists of government benefit recipients or schoolchildren in low-income areas, those records might be used to increase sampling efficiency. However, access to such records is likely to be limited by confidentiality requirements.

Location sampling works if members of the target population tend to congregate at identifiable locations. It is not obvious that there are such locations for this population.

Using online panel members works if the researcher is willing to treat members of the online panel as representative of some broader population such as all online users. This is a judgment call and will require some form of model-based estimation.

Ultimately, the extent to which any of these procedures will reduce the cost of reaching this target population is an empirical question, and will to some extent depend on what methods of data collection are deemed suitable for the research (i.e., whether it is feasible to administer the planned questionnaire by web, telephone, mail, or face-to-face).

8.3       A health scientist at a university wishes to conduct a panel study of dietary practices, exercise practices, and weight changes among students. The plan is to conduct an initial survey with entering freshman, with online follow-up questionnaires administered monthly to all participants for the following three years (including summers). What sampling plan would you propose for this study? How would you draw the initial sample? How would you maintain the panel? Would you propose any changes to the intended data collection procedures?

The initial sample is straightforward; we can draw a simple random sample of entering freshmen, subject to any desired stratification if specific groups are of interest.

The more difficult question is how to maintain the panel. This panel is likely to experience heavy mortality (panel members who stop responding). For example, as mentioned in Chapter 3, Couper et al. (2007) report a study pertaining to an online weight management intervention where 85% of the baseline participants did not respond to an online questionnaire at the 12-month measurement period. If the proposed study suffers 85% mortality the first year, almost none of the initial respondents may be left by the end of three years!

If mortality is random, we might maintain the panel with random replacements from the same cohort; that is, students who leave the panel would be replaced with random students from the same entering class. This would keep the panel representative of the cohort.

If mortality is not random – which seems almost certain – then maintenance becomes more difficult. One question is what to do with students who leave school. These students are likely to differ systematically from the students who stay; for example, they may be more likely to have time management or financial problems, which may in turn affect what they eat and whether they exercise. We probably will drop these students from the panel without replacement, because their departure reflects actual changes in the cohort. However, there also may be systematic patterns in students who leave the panel without leaving school; for example, panel mortality may be higher among students who gain weight and don’t want to report it, or among men versus women, and so on. If systematic patterns in mortality exist, they will bias the panel unless we replace the department members with students who match them. So even if we did not stratify the sample at the beginning of the study, we are likely to stratify for purposes of replacing dropouts with similar students (and if we can anticipate which groups are most likely to drop from the panel, we might over-recruit them at the start to reduce the need for replacements).

Even if we can match replacements perfectly, mortality will limit the researcher’s ability to track changes in individual students. For example, if mortality is 85% per annum, any comparisons between freshmen and sophomores and juniors will be between mostly different people. This is not a problem if the researcher’s focus is on tracking changes in diet, exercise, and weight at a group level as the cohort moves through its college years – but if that is the case, it would be simpler for the researcher to drop the panel design and simply take a series of random samples. If the focus in on patterns of change at the individual level (a key reason to use a panel), then the researcher will have incomplete data series from individual panelists as they come and go, and will have to draw conclusions about overall patterns of change through models that knit the different respondents together over time.

Anticipating such issues may lead us to change the design. For example, if we want to minimize mortality in the panel, we might ask whether we truly need monthly data throughout the three year period, or whether we could lessen the burden on respondents. This might be done in many ways. For example, if the researcher wants detailed data on students’ initial adaptation to college life, but will accept less detail later, then we could reduce respondent burden by taking monthly measures during the first semester and less frequent measures subsequently (e.g., at the start and end of each semester, once a semester, once a year, or even staggered across panelists to provide some data for every month). If the researcher wants detailed data on students’ initial adaptation, but will accept less detail later, then we might consider a monthly panel during the first semester and periodic surveys afterward. If the researcher wants monthly data throughout the three year period, but we feel that we need to reduce respondent burden to reduce unplanned mortality, then we might ask respondents to participate for a fixed number of observations (for example, we might ask panelists to promise us one semester after which they will be replaced, or we might ask for three follow-ups at staggered intervals, etc.).

We also might think about incentives and communication programs to keep panelists engaged.

The general point is that just as sampling procedures may be adapted to fit practical data collection issues, so too may data collection procedures be adapted to fit practical sampling issues, in this case the need to control a form of possible non-response bias.