Chapter Summary
Multivariate analysis is the simultaneous analysis of three or more variables on a set of cases. It can overcome some of the limitations of bivariate analysis, for example the joint effects of several variables operating together can be assessed, the risk of committing Type I errors (falsely rejecting a null hypothesis) is minimized, while conditional, confounding or mediating relationships can be detected. How this is done is a topic taken up in Chapter 10 of Kent (2015).
Where all the variables to be related are categorical, then it is possible to control the relationship between two variables by a third, fourth or fifth variable, and so on, in the process of three-way and n-way tabular analysis. The researcher can use these tables to explore the strength of relationships between three and more than three variables at a time. An alternative is log-linear analysis, which looks at the interaction effects between a number of categorical variables.
Multivariate techniques for metric variables, or for combinations of metric and categorical variables, may be classified into dependence and interdependence procedures. The former depend on the researcher being able to establish the status of variables as dependent or independent and include multiple regression, logistic regression, discriminant analysis and multivariate analysis of variance. Interdependence techniques include factor analysis, cluster analysis, correspondence analysis and multidimensional scaling. Not all of these are considered in this chapter. Cluster analysis and discriminant analysis are explained in Chapter 8. Correspondence analysis and multidimensional scaling are only briefly looked at.
One message that should come over clearly in this chapter is that not all the techniques can be applied or sensibly applied to a given dataset. If the dataset is not a random sample, then any techniques that use or rely on statistical inference must be treated with extreme caution or best not used at all. Dependence techniques are appropriate only when there is a sufficiently developed theoretical model that indicates which variables can sensibly be treated as dependent and which ones as independent. Within that framework, the researcher needs to be clear which of each variable is categorical and which is metric.