Why Politicians Hate Political Science

Article 1: Why Politicians Hate Political Science

2013 and 2014 saw some significant debates in the US Congress about funding for the National Science Foundation, and specifically for the part of the Social and Behavioral Sciences section that funds research in political science. In fact, Congress actually managed to pass an amendment prohibiting the NSF from funding political science research. The ironic thing, however, is that most of the politicians who passed this law almost certainly benefited from NSF-funded research in political science in their quest to be elected to national public office.

So if that’s the case, why do politicians hate political science? First, scholars have not always been good about explaining why and how the rather esoteric research we do actually matters for the real world. This is particularly true for research about the world outside of the United States; after all, why should US taxpayers fund the study of other countries’ politics? Research in political theory also often receives similar critiques, since the study of ancient Greek philosophy (among other things) has very little obvious direct relevance to contemporary political life. This miscommunication is largely a fault of the scholars who apply for grants. We have not been good about identifying and explaining the policy relevance of our research: how what we can learn from doing this research can help us improve policies and policy outcomes.

Second, many of the NSF-funded projects in American politics involve politically sensitive issues. The role of race in policy preferences and voting, the effect of emotion-evoking campaign advertising, and models of policymaker behavior have direct and immediate implications for campaigns and elections. The findings of research on most of these topics are unlikely to benefit the political right, which currently controls the US Congress.

In short, we’ve been remiss in clarifying the policy relevance of our research, while the political relevance of much of it is far too clear. That’s my take, at least, on the situation. Other scholars have provided some great commentary on their blogs, so if you’re curious, I encourage you to read some of the posts linked below and linked in those posts.

Bounded Theories and Bounded Investigations

Article 2: Bounded Theories and Bounded Investigations

To avoid inappropriate restrictions on your DV, you normally want to define it in conceptual terms and avoid any proper nouns. Including a proper noun almost by definition restricts the population because it specifies a particular set of cases – those associated with the European Union, Latin America, Green Party, or whatever – as the domain for the theory.   Inappropriately bounding the theory effectively limits the investigation as well. If you don’t believe that your theory applies to a particular set of cases, those cases obviously won’t appear in your analysis. But if they do belong in your analysis, then the relationships you find in the restricted sample are biased because the full range of observed values of all variables is not present in your sample. This type of truncation is problematic for quantitative and qualitative analysis alike. With quantitative analysis, we can correct for or at least accommodate truncation (sometimes called ‘censored’ observations). With qualitative analysis, it’s a bit trickier since we don’t have a nice neat mechanical correction to apply. As Geddes (2003, ch 3) demonstrates quite clearly, inappropriate bounding resulting from limiting the scope of a theory can lead to dramatically incorrect conclusions.

Limiting your theory’s scope is acceptable under a limited set of conditions. The largest of these is that the phenomenon in question occurs only in a particular context, usually by definition. Special elections to replace deceased or retired members of the US House of Representatives, for example, don’t occur outside the United States, and assuming that the dynamics of such elections differ in other countries seems appropriate. The same logic applies to voting behavior in the United Nations, headscarf wearing in secular Muslim-majority states, or similar kinds of things. The behavior of interest is itself confined to a particular institutional or cultural context that makes pooling observations across contexts inappropriate.[1]

A second reason for limiting scope is to investigate differences within a class of cases, events, or outcomes. Theda Skocpol (1979), for example, explicitly separates social revolutions from all other types of revolutions because she theorizes that the mechanisms producing this particular set of cases differ from those present in other types of revolutions. Investigating this within-type variation can create important new understandings, as in Juan Linz’s (1975, 2000) studies of variants of authoritarianism. More recently, scholars such as Barbara Geddes (1999) and Jessica Weeks (2014) unpack Linz’s types of authoritarianism even further to explain why prior theories created conflicting predictions and inconsistent findings. In situations like these, the population of cases is defined by its value on some previous variable, and the current study asserts that pooling the cases on that value is not an effective predictor because some intra-class variation makes the cases inappropriate for pooling.


[1] In fact, in quantitative research, inappropriate pooling of cases can have serious consequences for your results. At a minimum, the data usually suffer from heteroskedasticity since the causal processes underlying the various components of the pool differ. Depending on the extent of the problem, omitted variable bias is also likely. (See Chapter 8 for discussion of these problems and their effect on results, and also the Web Extra on Assumptions of OLS.)

Limiting the Range of Analysis

Article 3: Limiting the Range of Analysis

Limiting the investigation range should also occur with care and only with great need. Limiting the domain of the study can be appropriate when, for example, changes in the external circumstances mean that the causal process is expected to differ, or not apply in the same manner. For example, my dissertation research (Powner 2008) made arguments about the conditions under which European states would choose to cooperate and how they would choose what institution(s) to cooperate in. Formal European cooperation on foreign policy dates back to at least the late 1940s, with the formation of the Western European Union in 1948. My study, however, only examined cooperation between 1993 and 2003, even though my research occurred between 2007 and 2008. Why? Key changes happened in the external environment that caused states to employ different criteria for decision-making in foreign policy. These included the end of the Cold War, the creation of the European Community and then European Union, and the creation of a new foreign policy coordination body in the European Union. Because the decision to create new institutions was outside the study, I chose to start the period of analysis in November 1993, when the foreign policy part of the new European Union became operational. I chose to end the study in December 2003 because in early 2004 the Union went from 15 (exclusively Western European) members to 25 members including both Western Europe and its former adversaries in the Warsaw Pact. The dynamics of decision-making definitely changed with this “big bang” expansion, and less than two years had passed when I had to decide the scope of my study. This period wasn’t long enough to allow a new set of stable dynamics to emerge, and under conditions of unstable dynamics, my theory would be unable to predict much – not because the theory was poor but because the behavior itself was unpredictable and motivated by a different set of dynamics than the rest of the period. If I redid the study now, I would probably be able to study up through 2013; the last major change to European foreign policy institutions occurred in the European Union in December 2009, and enough time has passed to allow a new set of stable dynamics to emerge.

Other reasons to limit your sample exist. Beyond the simple reason that your theory doesn’t apply to them (i.e., non-Cuban Hispanics in Florida in a study of Cuban-American campaign contribution decisions), you may experience one or more of the following:

  • Data do not exist and cannot be easily constructed (i.e., gross domestic product or income before about 1945; quarterly birth or interest rates)
  • The institution or entity may not exist (independent central banks or the United Nations before the middle of the 20th century, women’s voting patterns before 1918 in the United States)
  • The meaning of the action or institution differs (gun ownership or Senate composition in 19th versus 20th century United States)
  • The full domain of the study is infinite (all political actions taken by citizens, states’ adherence to international law)

If you experience one of the first two situations, you should briefly explain that in your discussion of domain (“This study spans the years for which comparable GDP data is available from the World Bank, namely 1948-2012”; “The inception of the World Food Programme in 1963 marks the beginning of this study”). In the case of differing meanings of the action or institution, you should be prepared to support such a claim with sourced evidence. This could be as simple as “Prior to the ratification of the 17th Amendment in 1913, US senators were appointed by state legislatures. Their function was thus representing the state’s interests and not the mass public’s (Jones 1973), so an examination of public discourse and senatorial voting behavior rightly begins in 1918 with the first senate composed only of directly elected members.”

In the case of the last problem, that of infinite domain, the problem gets more complicated. Clearly, studying the full population of cases is impossible; no population census exists, and constructing one would be nearly impossible. The trick is to figure out a way to sample the population without a census but still in such a manner that the sample is still reasonably representative of the population. With a representative sample – or at least one where no clear bias exists[2] – you can plausibly make inferences to the rest of the population. Begin by figuring out a way to approximate a population census for a slice of the potential observations. You can choose to slice substantively or thematically (as in Simmons 2000), temporally (as in Powner 2008, discussed below), or by defining the population in a more bounded manner.

For example, in studies of the onset of war, scholars had to identify the full population of opportunities for states to go to war so that they could understand why some opportunities turn into wars and others don’t. Hypothetically, states could choose at any given day/hour/minute/second to attack one another, but collecting and analyzing data on any of these bases would not add a lot of value since an overwhelming majority of cases would be non-events (i.e., cases where no war results). After some collective discussion, they decided that any pair of states has an opportunity to go to war each year, so that the unit of analysis is the dyad (pair) – year. This defined the population in a manner that was reasonable both in terms of empirical plausibility, since no two states have ever engaged in war more than one time in a year, and in data collection. They also determined, however, that not all dyads have an opportunity to go to war. Bhutan and Belize, for example, may theoretically be able to have a war each year, but practically speaking, such a war is impossible since both states lack the military capabilities to reach one another. So they added a condition to the “all dyad-years” decision: the dyad’s members must be geographically contiguous (or separated by some minimal water distance) or the dyad must contain a great power state (who are defined partially by the ability to project power outside their own territory). The result is that studies of war onset now use these “politically relevant dyads” as their definition of the population.

Once you’ve defined the population, or at least devised a way to approximate it, you will often need to find a way to sample it. Determining whether a war occurred in any dyad-year is fairly easy, but for more difficult data collection, or where the population is still too large to study completely, sampling is a necessary step to achieving reasonable data set. Here, the goal is choosing cases from the population to enter the sample in such a way that characteristics of the cases themselves do not affect the probability of the case entering the sample. My dissertation research (Powner 2008) focused on understanding when and how European states chose to cooperate in response to foreign policy issues. To do this, I had to create a sample of events to which the states could have responded, regardless of whether they actually did so. I began by creating a list of events, behaviors, occurrences, or actions that European states had responded to in the past, using my knowledge of history and contemporary foreign relations to establish a list of criteria that would qualify something as a “foreign policy issue” potentially worth cooperating on. The entire population for my 10 year sample, however, would still be too big to analyze, so I took a sample of the population. I randomly generated page numbers from Keesing’s Record of World Events within the temporal domain of my study. I then read each of those (400+) pages and catalogued each item that would qualify as a “foreign policy issue.” That sample, however, still contained over a thousand cases, and the exhaustive research needed for each case meant that a thousand cases was simply infeasible. I thus created a stratified sample by year. If x% of the cases in the foreign policy issue sample came from 1995, for example, then I randomly selected x% of my final sample from the cases for 1995.[3] The result was a final sample of 300 cases where the probability of getting into the issue sample, and the probability of an issue being selected into the final sample, were totally independent of the cases’ characteristics; it was totally luck of the draw.

Limiting the investigation unnecessarily is very dangerous. To give a rather prominent example, a substantial body of literature believed that Islam was the primary factor impeding democracy in the Middle East. Scholars determined this by looking at Middle Eastern countries, noticing that they all were autocratic, and then noticing that they all had Islam as a state religion. On this basis, they concluded that Islam impedes democracy. Unfortunately, this research missed two crucial points. First, not all Islamic countries are in the Middle East. In fact, a majority of the world’s Muslims live outside of the Middle East. By excluding non-Middle-Eastern Muslim countries, researchers found a spurious relationship between Islam and democracy. Second, Islam is not the only characteristic that Middle Eastern states share. They all also have substantial natural resource wealth. When researchers included all countries and all religions, and also included other potential influences on democracy like resource wealth, they found little or no relationship between Islam and democracy. Instead, the primary cause of low democracy scores in the Middle East seems to be natural resource wealth; Middle East states share this with resource-rich states elsewhere (Ross 2001, 2008).


[2] At a minimum, this means no systematic bias on cases reaching the threshold for observation, and no systematic bias in selecting observed cases for the sample (i.e., observed cases have an equal probability of entering the sample). 

[3] Microsoft Excel easily generates random numbers; use the =RAND() or =RANDBETWEEN() commands. Be aware that performing additional actions in that worksheet will cause Excel to generate a new set of random numbers in the same range. If you need to preserve a set of random values, immediately after you generate the values, copy them to the clipboard. Then, go somewhere else in your file (or in a separate file), right click, and select “Paste Special.” You should get a menu of options, including “paste values only” or something like that. This will paste only the values generated by the RAND operation rather than also pasting the formulae that underlie the random numbers. 

Drawing Conclusions with Bounded Theories and Analyses

Article 4: Drawing Conclusions with Bounded Theories and Analyses

The most important thing to remember with bounded theories is that your conclusions cannot extend beyond the domain of the theory itself. You may hypothesize that similar mechanisms operate in other cases – such as other ethnic groups’ campaign donations in the Cuban-Americans in Florida example – but you cannot conclude that they do because this was not part of your theory or your empirics. In this case, I would probably suggest stepping back and reconsidering whether your theory is actually bounded, or whether it applies in a more general form to other groups as well and your current study is just a specific application of the broader claim.

The situation with bounded investigations is slightly different. If you use a limited sample because the population is infinite, and your observation selection procedures created a representative sample, then you can reasonably make conclusions about the full population. Likewise, if you limit the domain as a result of differing meanings or non-existence, then if you study the population of the restricted domain (or a representative sample), you should again be able to draw conclusions about the population in the restricted domain. Limitations resulting from the absence of data, on the other hand, create significant challenges in determining the breadth of your conclusions. The big question to answer is why the data do not exist. If the concept itself didn’t exist (‘citizenship’ before the 1920s, for example), then you cannot draw conclusions about the period prior to the concept. If the data do not exist because no entity existed to collect it (crossnational comparable estimates of national income, such as GDP), or because the research community has determined a period of observation that differs from your needs (quarterly birth rates), then you must confine your conclusions to the period for which data exist.

If data do not exist because the actors themselves have chosen not to make it available, or because the actors are unable to collect it, then we are in a different realm entirely. These are situations of non-randomly missing data, and they require handling with particular care. If you are in this situation, you should plan to consult with your instructor both during the case selection/domain definition process as well as while writing your conclusions. The reasons for systematically missing data are usually related to the causal mechanisms underlying most of the phenomena we study, so these situations are much more complicated to analyze; they often require guidance from a methodology expert. 

Geddes, Barbara. 2003. Paradigms and Sand Castles: Theory Building and Research Design in Comparative Politics. Ann Arbor: University of Michigan Press.

Geddes, Barbara. 1999. “Authoritarian Breakdown: Empirical Test of a Game-Theoretic Argument.” Paper presented at the Annual Meeting of the American Political Science Association, Atlanta, GA.

Linz, Juan J. 1975. Totalitarian and Authoritarian Regimes. In Macropolitical Theory, Vol. 3 of Handbook of Political Science, eds. Fred I. Greenstein and Nelson W. Polsby, 175-411. Reading, MA: Addison-Wesley.

Linz, Juan J. (2000). Totalitarian and Authoritarian Regimes. Boulder, CO: Lynne Reinner .

Powner, Leanne C. 2008. Consensus, Capacity, and the Choice to Cooperate. Ph.D. Dissertation: University of Michigan, Ann Arbor.

Ross, Michael L. 2001. “Does Oil Hinder Democracy?” World Politics 53,3: 325-61.

Ross, Michael L. 2008. “Oil, Islam, and Women.” American Political Science Review 102,1 : 107-23.

Simmons, Beth A. 2000. “International Law and State Behavior: Commitment and Compliance in International Monetary Affairs.” American Political Science Review 94(4): 819-835.

Skocpol, Theda. 1979. States and Social Revolutions: A Comparative Analysis of France, Russia, and China. Cambridge: Cambridge University Press.

Weeks, Jessica L.P. 2014. Dictators at War and Peace. Syracuse, NY: Cornell University Press.

The "Ideal" Research Design: The Experiment

Article 5: The “Ideal” Research Design: The Experiment

In science, the ideal form of research design is the experiment.  Experiments allow the investigator to control the values of all of the variables and manipulate them independently of one another, and to conduct repeated trials using different combinations of variable values. Because only one independent variable value is changing in any one trial, the investigator can be absolutely certain that any changes in the dependent variable (outcome) are solely due to the changed value of the independent variable.

Ideally, social scientists would love to have this level of certainty about their own findings. Unfortunately, though, we can’t rerun history using different values: Would George W. Bush have invaded Iraq if he had been 20 points less popular after 9/11? What if his popularity was 15 points lower? Or 10 points? We just can’t do that – it’s not possible, and even if it were, it wouldn’t be ethical. We are stuck with observational data, and we have to make the best of it.

Our solution to this problem is to control other variables as best as we can. Fortunately, multivariate statistical techniques do this, and most of the principles of qualitative case selection have this as their core idea. Both approaches have limitations, however, imposed largely by our inability to generate other cases with different combinations of variable vales. We are restricted to the set of cases that we have actually observed. We can rarely find two cases whose values on all variables (except the variable of interest) are perfectly matched. Even two observations of the same country or case at different points of time don’t match perfectly; at a minimum, they differ in time, and they differ in history since one of the cases has memory or knowledge of the outcome of the previous case. Cross-sectional (many cases observed at a single point in time) designs have even bigger problems, especially in quantitative research, because we rarely have two cases with the same exact values on a given variable. (Can you imagine two totally separate wars with identical death tolls? Or two countries whose birth rates are exactly the same?) It’s a highly improbable situation, especially for continuous variables.

Our goal is to get as close as possible to the experimental ideal so that we can have the same level of confidence in our conclusions. This just isn’t possible for most research questions. For a few questions, however, we are lucky in that natural experiments exist. The contexts are exactly the same except that some external event artificially separates the population into two groups that then receive different treatments. Examples of such external (exogenous) events include arbitrarily drawn colonial borders in Africa, which often separate members of the same cultural group or tribe into two or more states; the 2004 tsunami in the Indian Ocean, which destroyed many similar Indonesian islands but where responses and recovery efforts took very different tracks; a housing development whose residents vote in two different wards; or similar types of events that are not at all generated by the actors whose behavior we are trying to explain.

Social scientists occasionally are able to employ true experiments in their research. We can, for example, manipulate the content of campaign advertising and evaluate audience responses. We can also conduct survey experiments, in which respondents are randomly assigned to receive variations on the same question; the variations correspond to values of the independent variables that we are interested in. In both of these examples, random assignment of participants to treatments has the same effect as controlling variable values. Because the treatment (value of the investigatory variable of interest) is determined exogenously – by random assignment, rather than by some characteristic of the respondent that might be related to the outcome – we can give the results the same level of credence that we would give a ‘true’ experiment.