SAGE Journal Articles

Click on the following links. Please note these will open in a new window.

Article 1: Harden, J. J., & Desmarais, B. A. (2011). Linear models with outliers. State Politics & Policy Quarterly, 11(4), 371–389. doi: 10.1177/1532440011408929.

Summary/Abstract: State politics researchers commonly employ ordinary least squares (OLS) regression or one of its variants to test linear hypotheses. However, OLS is easily influenced by outliers and thus can produce misleading results when the error term distribution has heavy tails. Here we demonstrate that median regression (MR), an alternative to OLS that conditions the median of the dependent variable (rather than the mean) on the independent variables, can be a solution to this problem. Then we propose and validate a hypothesis test that applied researchers can use to select between OLS and MR in a given sample of data. Finally, we present two examples from state politics research in which (1) the test selects MR over OLS and (2) differences in results between the two methods could lead to different substantive inferences. We conclude that MR and the test we propose can improve linear models in state politics research.

Questions to Consider

1. What are the three main points presented as evidence of the utility of median regression in state politics research?

2. How do the ordinary least squares and median regression models solves for β?

  1. Ordinary least squares minimizes the sum of the absolute deviations while median regression minimizes the sum of the squared residuals.
  2. Ordinary least squares minimizes the sum of the squared residuals while median regression minimizes the sum of the absolute deviations.
  3. Ordinary least squares maximizes the sum of the squared residuals while median regression minimizes the sum of the absolute deviations.
  4. Ordinary least squares minimizes the sum of the squared residuals while median regression maximizes the sum of the absolute deviations.

3. The median regression model is a more efficient estimator if large ______ are a result of ________variation.

  1. infrequent outliers; stochastic
  2. infrequent outliers; dependent
  3. frequent outliers; stochastic
  4. frequent outliers; dependent
     

Article 2: Parish, R. C. (1989). Comparison of linear regression methods when both variables contain error: Relation to clinical studies. The Annals of Pharmacotherapy, 23. doi:10.1177/106002808902301111.

Summary/Abstract: Five common linear regression methods were evaluated for their ability to determine the correct values of slope and intercept of a known function after random errors were added to x and y. The error variances were controlled to simulate research problems commonly studied by linear regression. The total error of each method was assessed by the absolute value of the bias in the estimate of slope. Whenever differences among methods were observed, the mean of the slope determined by two reciprocal techniques performed as well as or better than orthogonal regression, regression of y upon x, or xupony. All the methods studied appeared to perform equally well when x and y errors were heteroscedastic or when the data set was small (n = 7). Regression of y upon x was equal or superior to other methods when n = 7 or n = 20 and y and x errors were homoscedastic. When the data set was large (n = 50) and the error in x greater than that in y, the standard method (regression of y upon x) was inferior to all other methods. It is suggested that linear regression by the traditional method of y upon x (a method present in many hand-held calculators) is appropriate in the majority of clinical situations, but when n is large and errors in x are much larger than those in y, orthogonal regression or the averaging method may be preferable.

Questions to Consider

1. When heteroscedastic errors exist in x and y, which, if any, linear regression methods are better in terms of total error?

2. Regression methods require certain basic assumptions about the nature of the errors in x and y. Which of the following is not an assumption?

  1. The straight-line relationship y = mx + b exists.
  2. The errors in the observations of x and y are independent.
  3. The errors are normally distributed about a mean of zero.
  4. The errors in y and x (if present) are heteroscedastic.

3. What two methods make up Method M?

  1. least squares regression of y upon x and least squares regression of x upon y
  2. orthogonal least squares regression and least squares regression of y upon x
  3. method O2 and method OE
  4. method OE and orthogonal least squares regression