SAGE Journal Articles

Click on the following links. Please note these will open in a new window.

Barrio, I., Arostegui, I., Rodríguez-Álvarez, M., & Quintana, J. (2015). A new approach to categorising continuous variables in prediction models: Proposal and validation. Statistical Methods in Medical Research, pii: 0962280215601873.doi:10.1177/0962280215601873.

When developing prediction models for application in clinical practice, health practitioners usually categorize clinical variables that are continuous in nature. Although categorization is not regarded as advisable from a statistical point of view, due to loss of information and power, it is a common practice in medical research. Consequently, providing researchers with a useful and valid categorization method could be a relevant issue when developing prediction models. Without recommending categorization of continuous predictors, our aim is to propose a valid way to do it whenever it is considered necessary by clinical researchers. This paper focuses on categorizing a continuous predictor within a logistic regression model, in such a way that the best discriminative ability is obtained in terms of the highest area under the receiver operating characteristic curve (AUC). The proposed methodology is validated when the optimal cut points’ location is known in theory or in practice. In addition, the proposed method is applied to a real dataset of patients with an exacerbation of chronic obstructive pulmonary disease, in the context of the IRYSS-COPD study where a clinical prediction rule for severe evolution was being developed. The clinical variable PCO2 was categorized in a univariable and a multivariable setting.

Questions to Consider

1. Describe and discuss how the authors propose a new approach to categorizing clinical variables that are continuous in nature?

Cognitive Domain: Comprehension

Difficulty Level: Medium

 

2. Explain what constitutes a continuous predictor.

Cognitive Domain: Knowledge

Difficulty Level: Medium

 

3. How do the authors go about categorizing a continuous predictor within a logistic regression model, in this specific article?

Cognitive Domain: Analysis, Knowledge

Difficulty Level: Hard

 

Zhang, H., Maity, A., Arshad, H., Holloway, J., & Karmaus, W. (2016). Variable selection in semi-parametric models. Statistical Methods in Medical Research, 25(4), 1736–1752. doi:10.1177/0962280213499679.

We propose Bayesian variable selection methods in semi-parametric models in the framework of partially linear Gaussian and problit regressions. Reproducing kernels are utilized to evaluate possibly non-linear joint effect of a set of variables. Indicator variables are introduced into the reproducing kernels for the inclusion or exclusion of a variable. Different scenarios based on posterior probabilities of including a variable are proposed to select important variables. Simulations are used to demonstrate and evaluate the methods. It was found that the proposed methods can efficiently select the correct variables regardless of the feature of the effects, linear or non-linear in an unknown form. The proposed methods are applied to two real datasets to identify cytosine phosphate guanine methylation sites associated with maternal smoking and cytosine phosphate guanine sites associated with cotinine levels with creatinine levels adjusted. The selected methylation sites have the potential to advance our understanding of the underlying mechanism for the impact of smoking exposure on health outcomes, and consequently benefit medical research in disease intervention.

Questions to Consider

1. Explain the role of indicator variables in this article.

Cognitive Domain: Analysis

Difficulty Level: Hard

 

2. What do the simulations results indicate?

Cognitive Domain: Knowledge, Analysis

Difficulty Level: Hard

 

3. How are the variable selections approaches built upon the evaluation of an overall set effect?

Cognitive Domain: Comprehension

Difficulty Level: Hard

 

Funder, D. C., Levine, J. M., Mackie, D. M., Morf, C. C., Sansone, C., Vazire, S., . . . West, S. G. (2014). Improving the dependability of research in personality and social psychology: Recommendations for research and educational practice. Personality and Social Psychology Review, 18(1), 3–12. doi:10.1177/1088868313507536.

In this article, the Society for Personality and Social Psychology (SPSP) Task Force on Publication and Research Practices offers a brief statistical primer and recommendations for improving the dependability of research. Recommendations for research practice include (a) describing and addressing the choice of N (sample size) and consequent issues of statistical power, (b) reporting effect sizes and 95% confidence intervals (CIs), (c) avoiding “questionable research practices” that can inflate the probability of Type I error, (d) making available research materials necessary to replicate reported results, (e) adhering to SPSP’s data sharing policy, (f) encouraging publication of high-quality replication studies and (g) maintaining flexibility and openness to alternative standards and methods. Recommendations for educational practice include (a) encouraging a culture of “getting it right,” (b) teaching and encouraging transparency of data reporting, (c) improving methodological instruction and (d) modeling sound science and supporting junior researchers who seek to “get it right.”

Questions to Consider

1. Describe some of the growing concerns about the dependability and replicability of research findings.

Cognitive Domain: Comprehension

Difficulty Level: Medium

 

2. Explain why some are now arguing that incentive structures and research practices produce a high rate of false positive findings.

Cognitive Domain: Knowledge, Analysis

Difficulty Level: Medium–Hard

 

3. How is that research findings have been reported as “significant” when there actually is no relationship in the population from which the current sample was drawn? What does this do to the validity and reliability of psychological research?

Cognitive Domain: Analysis, Comprehension

Difficulty Level: Hard

 

Al-Hattami, A. (2014). Short- and long-term validity of high school GPA for admission to colleges outside the United States. Journal of College Student Retention, 16(2), 277–291.

High school GPA is the only admission criterion that is currently used by many colleges in Yemen to select their potential students. Its predictive validity was investigated to ensure the accuracy of the admission decisions in these colleges. The relationship between students’ persistence in the 4 years of college and high school GPA was studied as well. The sample in the study consisted of 1,603 cohort students from two public universities in Yemen. The data analysis included simple, multiple, and logistic regression analyses. Results showed that high school GPA was a significant predictor of academic performance as measured by first-year college GPA and 4-year cumulative GPA. However, it explained a very small portion of the total variance in these academic variables. It was also found that it has no relationship with students’ persistence. Therefore, a comprehensive review of the use of high school GPA for admission decisions is strongly recommended.

Questions to Consider

1. Al-Hattami reported an R2 of 0.13 for predicting first-year college GPA and a R2 of 0.14 in predicting fourth-year cumulative GPA. What are the f2 values associated with these R2 values.

Learning Objective: Effect size

Cognitive Domain: Application

Difficulty Level: Medium

 

2. Is Al-Hattami’s study a true predictive validity study? (a) No, it does not consider future outcomes. (b) No, it was just looking at short and long-term validity. (c) Yes, all validity studies are predictive validity studies. (d) Yes, high school GPA came before college GPA.

Learning Objective: Predictive validity

Cognitive Domain: Evaluation

Difficulty Level: Medium

 

3. From Table 4, which of the following statement is true about high school GPA’s relationships with the criteria? (a) GPA was a stronger predictor in Commerce than in Education. (b) GPA was a stronger predictor in Arts than in the Sciences. (c) GPA was predictor of graduation in all colleges. (d) All GPA correlations were too weak to draw conclusions.

Learning Objective: Effect size

Cognitive Domain: Knowledge

Difficulty Level: Easy

 

Lambert, E. G., Hogan, N. L., & Altheimer, I. (2010). An exploratory examination of the consequences of burnout in terms of life satisfaction, turnover intent, and absenteeism among private correction staff. The Prison Journal, 90(1), 94–114.

Burnout, a syndrome caused by excessive strain and psychological exhaustion, comprises the dimensions of emotional exhaustion, depersonalization, and feelings of being ineffective. Survey results from 160 correctional staff at a maximum security private prison in the Midwest were used to compute ordinary least squares regression equations in order to reveal the effects of burnout on the outcomes of life satisfaction, turnover intent, and absenteeism. Ineffectiveness was linked with none of the three outcomes. Depersonalization was linked with increased turnover intent and more frequent absenteeism, and emotional exhaustion was linked with all three outcomes. The results differed somewhat between female and male staff and between correctional and noncorrectional officers.

Questions to Consider

1. The authors provide the study’s demographic variables and regression results of regressing life satisfaction, turnover intent, and absenteeism over the predictor variables. They do not report the correlations between variables. How could this information be useful to researchers?

Learning Objective: Correlation and regression

Cognitive Domain: Analysis

Difficulty Level: Medium

 

2. The demographic and burnout variables had an R2 of 0.38 with turnover intention. The adjusted R2 value would be: (a) lower, (b) the same, (c) larger, (d) indeterminable.

Learning Objective: Regression

Cognitive Domain: Knowledge

Difficulty Level: Hard

 

3. According to Chapter 9, what statistic should the authors be reporting with their R2 values? (a) t-Test. (b) F statistic. (c) f2. (d) 95% CI.

Learning Objective: Regression

Cognitive Domain: Knowledge

Difficulty Level: Medium