Pretesting Your Data Collection Procedure and Coding Rules

Article 1: Pretesting Your Data Collection Procedure and Coding Rules

Pretesting is a crucial phase of collecting data, especially in processes such as content analysis that depend on applying a set of coding rules consistently to a large number of cases. The point of pretesting is to ensure that your rules are sufficiently clear, and your codes sufficiently comprehensive, to allow you to code all of your data in a consistent manner. Consistency in coding is absolutely essential, and pretesting is the only way to achieve it.

Once you have a preliminary dictionary (or other set of coding rules), you should pre-test it on some actual data to ensure that you have a workable prototype.[1] To pre-test, take a small random or semi-random sample of your sources – 5-10% will do – and try applying your dictionary to it. Unless your sources are exceptionally long, I recommend that you do this by hand on hard copies of the documents. Use highlighters and colored pens liberally to code content. Depending on the length of the dictionary and/or complexity of the search term list, you may be able to read each document once, looking for all of your terms or ideas at the same time. Most of the time, though, you will probably have to read it multiple times, looking for sub-sets of the terms each time. If your dictionary is word-based – you have single specific terms – you can probably use the “find” command on a computer to do this most efficiently.

As you conduct your pre-testing, be aware for ideas or terms that may fit your data requirements but which are not currently in the dictionary. Go back and review your sample texts to see what is in them that is not currently flagged (coded) as relevant to your research. Should you include some of it? If so, how? Likewise, you should make copious notes as you code during this pretest phase so that when you encounter some of these terms or expressions again in later coding. These notes are crucial to developing a set of rules that you can apply to produce consistent results – that is, in producing a reliable measurement – about the concepts of interest.

The pretesting phase is when you adjust your dictionary and coding rules and to elaborate them as needed with examples so you can apply them consistently later on. Once you are satisfied with your dictionary or codebook revisions, print off clean copies of your pre-test sample (or some subset of it), pull another 5% or so from the complete data set, and apply the revised codebook again. If you are satisfied with this second pre-test, you should be good to go. I recommend that you set aside these marked-up copies for later reference; return clean copies of all the pre-test sources to the pool and re-code them again when you come to them in the pile.


[1] Failure to pre-test typically results in having to go back and re-code large swaths of material – sometimes more than once – as your dictionary evolves. Pre-testing won’t eliminate the need to do some re-coding, since dictionaries almost always evolve, but it does help to minimize the risk of needing to do massive or multiple re-codes. Smart researchers do not skip this step.

Participant Observation

Article 2: Participant Observation

Participant observation is a fieldwork-based strategy for qualitative data collection. In it, the researcher observes a group, organization, or other institution, not as a disinterested outside observer but as a participant, someone who is inside the organization. The internal vantage point allows the researcher to gain a more personal understanding of how other participants view, understand, and perform their roles. Its use in political science is rather limited, but it sees considerable use in anthropology, sociology, and education research.

For example, let’s consider a hypothesis about whether peoples’ motivations for volunteering with underserved communities influences the way(s) in which they interact with the members of that community, and the volunteers’ beliefs about the causes and consequences of belonging to that underserved community. I may hypothesize, for example, that those who volunteer at, say, a homeless shelter out of religious motivations – “Jesus commands us to help the poor” – are more likely to be unconsciously condescending to shelter clients, or to interact with them in other ways which emphasize the volunteer’s position of privilege. I could, in theory, simply ask volunteers about their motivations and interactions, but if the behavior is unconscious, then even volunteers who are doing their best to answer truthfully may not actually give us correct data. In a case like this, I might want to approach the shelter director about conducting research there by serving in a volunteer capacity and observing the behavior of other volunteers. I would then observe the ways in which volunteers interact with the shelter’s clients and perhaps use semi-structured or unstructured interviews with the volunteers to learn about their motivations.

Participant observation would be better in this case than simple direct observation (non-participant observation) because unless I interacted with the clients myself, and carefully analyzed the types of (inter)actions that might constitute condescension or expression of privilege, I may have only a poor idea of what behaviors to look for. The presence of a stranger in the room might also change the way that clients and volunteers act or interact, so that the data I observe as a non-participant would be biased.

Because participant observation involves direct interaction with research subjects, often under mild deception,[1] it is particularly fraught with ethical concerns. You will definitely need to consult with your instructor and/or your institution’s Human Subjects Review Board about how to proceed. You will also need to take copious field notes; this is true for all participant observation studies, but it is especially true for inductive or exploratory research.


Basic orientation to participant observation:

Barbara B. Kawulich, 2005, “Participant Observation as a Data Collection Method”, Forum: Qualitative Social Research 6,2 Art 43.

Very thoughtful discussion for geography:

Helpful Hints for Taking Field Notes:


[1] Some mild deception is usually necessary to minimize the influence of your presence on the research context. In our example, if other volunteers knew that you were particularly interested in condescension and religious views, those who are religiously motivated might change their behavior. (After all, you wouldn’t have that hypothesis if you didn’t think there was some support for it.) So you would probably tell the shelter director about your true research objectives along with the rationale for not communicating this to other volunteers; hopefully this explanation would encourage him/her to keep quiet about your research objectives as well. You may need a simple cover story to explain why you are now volunteering at the shelter for when others ask you. As a result, your Human Subjects Board will probably require you to do some form of debriefing where you inform others about what you were doing and their rights as research subjects/participants. Again, talk to your school’s Human Subjects Board for specifics for your situation.

Electronic Qualitative Data Collections

Electronic Qualitative Data Collections

Online Collections of Interest to Political Scientists

Foreign Relations of the United States:

Public Papers of the Presidents: (Reagan through Obama available online)

The Federal Register:

Chronicling America: Historic American Newspapers:
US Department of State Daily Press Briefings:

United Nations Official Documents:

United Nations Treaty Series:

Vanderbilt University Television News Archive:

European Union Documentation:

Social Networks and Archival Context:

The American Presidency Project:

Collections of e-Collections

Qualitative Data Archive:

US Library of Congress:

See especially the ‘Free Resources’ link, which contains items available to anyone.

US LoC Digital Special Collections:

Collections of relevance to political science research include: American Time Capsule, From Slavery to Freedom, African American Perspectives, Evolution of the Conservation Movement, Presidential Inaugurations, Slaves and the Courts, and Votes for Women.

US LoC American Folklife Center:

Resources of interest include: Civil Rights History Project, Experiencing War, Pearl Harbor, September 11 Documentary History Project, Voices from the Days of Slavery (ex-slave oral histories).

Archives Portal Europe:

Searchable guide to archival materials in 711 national, academic, and private archives in 30 European countries

UK National Archives:

Online collections include 20th Century Politics, Migration, and military branches

OCLC Archive Grid: