# Chapter 7: Observational Studies: Two Measurement Variables

1. Covariance will always _________

1. Be a positive number.
2. Be greater than the product of the variance of the two variables.
3. Be equal to or less than 1.0.
4. Be less than either of the variances.
5. Reflect the direction of the association.

2. A correlation coefficient estimates ______.

1. The strength of the association between two variables.
2. The y-variable values for a specific x-variable value.
3. The x-variable values for a specific y-variable value.
4. The change in the y-variable values associated with changes in the x-variable values.
5. None of the above.

3. A newspaper headline writer found that the more adjectives she put in the titles of her articles the greater the number of newspapers sold that day. This association between the number of adjectives and the number of newspapers sold is said to be _______.

1. Significantly positive.
2. Significantly negative.
3. Positive.
4. Negative.

4. The null hypothesis in a correlational study is that ___________.

1. B = 0.
2. a = 0.
3. r < 1.0.
4. r > 1.0.
5. r = 0.

5. The relationship between the number of cups of coffee consumed (x) and caffeine levels found in the blood (y) was investigated in 100 University of Toronto students. The data resulted in the following regression equation.
ŷ= 2X + .05
This equation indicates _________.

1. Each cup of coffee increases caffeine levels by 5%.
2. It takes 18 cups of coffee to increase caffeine levels by one unit.
3. For each cup of coffee caffeine levels increase by two units.
4. All of the above.
5. None of he above

6. In Ordinary Least Squares regression, which of the following is (are) assumed _______.

1. The relationship between the x and the y variables is linear.
2. There is equal variance of the y values about the regression line.
3. The distributions of the x and y variables is roughly normal
4. All of the above are assumed.
5. None of the above are assumed.

7. Covariance can never ___________.

1. Be equal to the product of the two standard deviations.
2. Be greater than the product of the two standard deviations.
3. Be less than the product of the two standard deviations.
4. Be greater than 1.0.
5. There are no limitations on the covariance.

8. In a regression analysis, if the predictor variable is measured in milliliters, the criterion variable _________.

1. Can be in any units.
2. Must also be measure in milliliters.
3. Cannot be measured in milliliters.
4. Must be standardized
5. Must be measured in some units of volume.

9. The null hypothesis in a regression study is that ___________.

1. B = 0.
2. a = 0.
3. r < 1.0.
4. r > 1.0.
5. r = 0.

10. __________ is the minimum number of subjects required for statistical power of .80 when testing for a correlation of .50 or greater.

1. 783
2. 28
3. 50
4. 80
5. 40

1. What are the two primary distortions associated with the problem of heteroskedasticity?

Main Points:

1. The correlation coefficient will underestimate the strength of the association for some segments of the x-axis and will overestimate the association at other segments.
2. The amount of error (standard error of the estimate) related to predictions will be underestimated for some segments of the x-axis and will be overestimated at other segments

2. What is the third-variable problem and how is it relevant to correlation and OLS regression analysis?
Main points:

1. The third-variable problem highlights the importance for the researcher to not only know the most appropriate statistical test to employ, but also to be able to identify potential third-variable problems and how to eventially control for their effects.
2. The third variable is a potential source of concern in OLS regression with respect to the reliability and validity of B and r.

3. Why is linearity a necessary assumption of both correlational and regression analysis?
Main Point:

1. Because correlational and regression analysis are concerned only with OLS linear associations.

4. Why is homoskedasticity an assumption of both correlational and regression analysis?

Main Points:

1.  Violation of homoscedasticity can cause under- or overestimation for both correlational (strength of association) and regression analyses (prediction errors).
2. Inconsistency across the regression line is the source of the over and under estinmates.

5. What is the third-variable problem and how does it pertain to correlational analysis?

Main Points:

1. The third-variable problem highlights the importance for the researcher to not only know the most appropriate statistical test to employ, but also to be able to identify potential third-variable problems and how to eventially control for their effects.
2. Third-variable can present serious problems for interpreting simple correlation studies.
3. It can lead to overestimates or underestimates of the correlation.

6. How are the  coefficient and r related? How are they different?
Main Points:

1. Both  and r are standardized estimates of the strength or degree of association between two variables.
2. Φ measures association between two categorical (nominal and ordinal) variables, r measures association between two measurement variables.
3. The absolute values of both range from 0.0 to 1.0.

#### Data set questions.

1. The manager of the Toronto United soccer club is interested in the number of matches his starting 11 players missed due to injury. It seemed to him that those who missed more matches than the others one year also missed more matches the next year. To test his suspicion he recorded the number of days the 11 players missed due to injury for two years. The data are reported in the below table. It lists the number of days each of the players missed during the two years. Do the data support the manager’s suspicion? Be sure to create a scatterplot and any necessary test of significance.

2. How would the results change if Player #11’s number of missed matches in Year Two changed from 2 to 9?

3. How would the results change if Player #9 had missed 19 matches in Year One and 28 matches in Year Two? (Return Player # 11’s Year Two number of missed matches to 2.)

4. Are there any univariate or bivariate outliers in the data used in Questions #2 and #3?
Main notes: outliers are defined as scores 4 standard deviations above or below the mean.
a. Univariate outlier check:
Question2:

5. What do Questions #2 and #3 illustrate?

1. Questions 2 & 3 illustrate the necessity of inspecting scatterplots prior to judging and reporting correlation coefficients.
2. Question 3 illustrates the distortion effect of a bivariate outlier.

6. A cultural anthropologist suspected that those who watched more home renovation programs on television would also be those who watched more cooking programs. To test out her hypothesis she asked ten of her neighbours to keep track of the number of home renovation and cooking programs they watched for a week. The data are reported in the below table.

7. If another neighbour watched 4 hours of home renovation programs during a week, how many hours would you predict this neighbour would watch cooking programs? Include in your answer a scatterplot and any necessary tests of significance.
Coefficientsa Model Unstandardized Coefficients

8. Create two data sets where the correlation coefficients are significant. When the two data sets are amalgamated, however, the resulting correlation coefficient should be non-significant. Use ten subjects in each of the data sets. The x and y variables should be the same in both of the original data sets.