SAGE Journal Articles
Click on the following links. Please note these will open in a new window.
Summary/Abstract: It is more common for educational and psychological data to be nonnormal than to be approximately normal. This tendency may lead to bias and error in point estimates of the Pearson correlation coefficient. In a series of Monte Carlo simulations, the Pearson correlation was examined under conditions of normal and nonnormal data, and it was compared with its major alternatives, including the Spearman rank-order correlation, the bootstrap estimate, the Box-Cox transformation family, and a general normalizing transformation (i.e., rankit), as well as to various bias adjustments. Nonnormality caused the correlation coefficient to be inflated by up to + .14, particularly when the nonnormality involved heavy tailed distributions. Traditional bias adjustments worsened this problem, further inflating the estimate. The Spearman and rankit correlations eliminated this inflation and provided conservative estimates. Rankit also minimized random error for most sample sizes, except for the smallest samples (n = 10), where bootstrapping was more effective. Overall, results justify the use of carefully chosen alternatives to the Pearson correlation when normality is violated.
Questions to Consider
1. Nonnormality could lead to two types of distortion in the point estimate of the correlation. Describe what they are.
2. Which correlational approaches were generally immune to inflating the correlation coefficient when variables were Extremely Skewed or Heavy Tailed and showed a small but consistent negative bias?
- Spearman and RIN
- Pearson and Box-Cox
- Box-Cox and Spearman
- Bootstrap and Pearson
3. Inflated correlations ____ require outliers to be caused by contamination or measurement error; the problem occurs even though outliers ___ a part of the population distribution and the underlying correlation of interest.
- do not; are not
- do; are not
- do not; are
- do; are not
Summary/Abstract: For many who deal with correlation (as students, teachers, or applied researchers), the connection between group heterogeneity and the magnitude of Pearson’s r is difficult to pin down. Confusion abounds because factors that increase score variability do not have a similar effect on r. Three such factors are considered in this paper, with the point made that an increase in ßr (and/or in ßy) can be associated with an increase or a decrease in r or possibly with no change in r whatsoever! A helpful distinction is also drawn between (a) properties of the persons or objects that define one’s population of interest and (b) properties of the numbers assigned to those persons or objects (or a sample of them) as a result of the measurement process.
Questions to Consider
1. What are three factors that affect score variability?
2. If a sample is taken in a “conditional” way relative to the X and/or Y scores, then the expected value of the sample correlation will be smaller than a “non-conditionalized” population counterpart. This concept is called ________.
- influence
- an inflated range
- a biased range
- restriction of range
3. Errors of measurement cause a correlation to decrease because of __________.
- decreases in score variability
- increases in score variability
- outliers
- skewness