Journal Articles

Freedman, David A. 2006. Statistical Models for Causation: What Inferential Leverage Do They Provide? Evaluation Review 30 (6): 691-713. 

Abstract: Experiments offer more reliable evidence on causation than observational studies, which is not to gainsay the contribution to knowledge from observation. Experiments should be analyzed as experiments, not as observational studies. A simple comparison of rates might be just the right tool, with little value added by “sophisticated” models. This article discusses current models for causation, as applied to experimental and observational data. The intention-to-treat principle and the effect of treatment on the treated will also be discussed. Flaws in perprotocol and treatment-received estimates will be demonstrated.

Discussion Questions:

  1. Please discuss why experiments tend to offer more reliable evidence for causation than observational studies.
  2. In using regression analysis, what are some mistakes that the researcher should avoid when estimating the average causal effect?
  3. How do assumptions play into the research design?


Gibbs, Benjamin G., Kevin Shafer, and Midaela J. Dufur. 2012. Why infer? The use and misuse of population data in sport research. International Review for the Sociology of Sport 0 (0): 1-7. 

Abstract: While the use of inferential statistics is a nearly universal practice in the social sciences, there are instances where its application is unnecessary or, worse, misleading. This is true for most research on the Relative Age Effect (RAE) in sports. Given the limited amount of data needed to examine RAE (birth dates) and the availability of complete team rosters, RAE researchers are in a unique position—inference is not needed when interpreting findings because the data is from a population. We reveal, over the course of five years, the misapplication of inferential statistics using census data in 10 of 13 RAE studies across 12 sports journals. Thus, perhaps by inertia, the majority of RAE researchers use inferential statistics with their census data, misusing analytic techniques and, in some cases, undervaluing meaningful patterns and trends.

Discussion Questions:

  1. Please discuss how the use of inferential statistics may lead to misinterpretations and consequently run contrary to important patterns and trends.
  2. Discuss how the researchers utilize specific census data to frame their arguments on the use of inferential statistics.
  3. What consequences do the authors of the study identify in situations where inferential statistics are misused?


Patton, Jeffrey M., Ying Cheng, Ke-Hai Yuan, and Qi Diao. 2013. Bootstrap Standard Errors for Maximum Likelihood Ability Estimates When Item Parameters are Unknown. Educational and Psychological Measurement  XX (X): 1-16. 

Abstract: When item parameter estimates are used to estimate the ability parameter in item response models, the standard error (SE) of the ability estimate must be corrected to reflect the error carried over from item calibration. For maximum likelihood (ML) ability estimates, a corrected asymptotic SE is available, but it requires a long test and the covariance matrix of item parameter estimates, which may not be available. An alternative SE can be obtained using the bootstrap. The first purpose of this article is to propose a bootstrap procedure for the SE of ML ability estimates when item parameter estimates are used for scoring. The second purpose is to conduct a simulation to compare the performance of the proposed bootstrap SE with the asymptotic SE under different test lengths and different magnitudes of item calibration error. Both SE estimates closely approximated the empirical SE when the test was long (i.e., 40 items) and when the true ability value was close to the mean of the ability distribution. However, neither SE estimate was uniformly superior: the asymptotic SE tended to underpredict the empirical SE, and the bootstrap SE tended to overpredict the empirical SE. The results suggest that the choice of SE depends on the type and purpose of the test. Additional implications of the results are discussed.

Discussion Questions:

  1. Please discuss the concept of the bootstrap procedure for the standard error.
  2. According to the authors, when is the procedure for the bootstrap standard error utilized?
  3. What are the limitations of the study, as outlined by the researchers?
  4. How may this study be extended?


Tyron, Warren W., and Charles Lewis. 2009. Evaluating Independent Proportions for Statistical difference, Equivalence, Indeterminacy, and Trivial Difference Using Inferential Confidence IntervalsJournal of Educational and Behavioral Statistics 34 (2): 171-189.

Abstract: Tryon presented a graphic inferential confidence interval (ICI) approach to analyzing two independent and dependent means for statistical difference, equivalence, replication, indeterminacy, and trivial difference. Tryon and Lewis corrected the reduction factor used to adjust descriptive confidence intervals (DCIs) to create ICIs and introduced trivial statistical difference. They also introduced hybrid confidence intervals containing both ICI and DCI limits as replacements for error bars. This article generalizes the ICI method to include asymmetric as well as symmetric confidence intervals. Application is made to two independent proportions, odds, odds ratios, and log odds.

Discussion Questions:

  1. Please discuss the concept of the inferential confidence interval (ICI) approach to analyzing means for statistical difference.  What benefits does it present in statistical analysis?
  2. Confidence intervals are important to statistical analysis.  However, please discuss what problems they also present.
  3. What choices need to be made by the researcher prior to applying the ICI approach?