Youth Offender Program Evaluation

Authors: Dan Kaczynski, Ed Miller and Melissa A. Kelly

Setting up the project

Given the complex scope of this national evaluation that followed 52 projects, the study design required considerable attention prior to entry into the field. An important aspect of the longitudinal evaluation study was to test various service delivery models intended to help youth offenders and other vulnerable youths prepare for jobs that provided a decent wage and offered career potential. As originally envisioned, the evaluation’s primary goal was to assess the YODP’s effectiveness at providing core reentry services, workforce development, and additional needed services to youth offenders, gang members, and youths at risk of gang involvement. This goal also included an assessment of the outcomes of the developmental and support activities that were implemented for each project.

At the onset of the projects, DOL suspected that the effort would be difficult due to the varied needs of the youths and the range of programs and services established to address them. Among the range of needs were assistance with homelessness, drug and alcohol addiction, poor educational attainment, and weak family relationships. Services provided to the youths included employment, education, training, alcohol and drug abuse interventions, individual and family counseling, antigang activities, and recreation. Each project was initiated in one of three stages and evaluated accordingly. In Round I, (1999 to 2001), DOL awarded $13 million to a cohort of 14 entities that included states, counties, cities and nonprofit organizations. The awards were for 24 months, of which 6 months were for planning and 18 months were for operations. The second cohort, initiated in Round II (2001 to 2003), consisted of nine entities newly awarded a total of $8.2 million and an extension of 10 projects from Round I for an additional year. The grant period for the new awards was 30 months (6 months for planning and 24 months for operations). In Round III (2002 to 2005), DOL awarded a total of $11.5 million to a cohort of 29 communities. As in the case of the Round II projects, the grant period was 30 months.

Evaluation Purposes

As the demonstration progressed, the evaluation team coped with three distinctly different evaluation purposes. The Round I evaluation consisted of a process evaluation of 11 of the 14 sites operating in large and small communities. Research and Evaluation Associates, Inc., located in Chapel Hill, NC, received a contract to provide a process evaluation and technical assistance to 11 of the projects. Another firm received a contract to conduct an outcomes evaluation of programs run by three detention facilities (this evaluation is not discussed in this report).

During this phase of the evaluation, the intent of REA’s evaluation was to track the implementation progress of the sites for internal use by DOL staff. The Round II evaluation was primarily a formative evaluation of the nine sites in the cohort, which operated in large and small communities and in a confinement facility. The evaluation team sought to empower the projects by helping them implement the demonstration according to a model prescribed by DOL. The same firm, Research and Evaluation Associates, Inc., provided both evaluation and technical services to the projects. In Round III, the evaluation used case study methodology to intensively examine a purposive sample of projects and conduct focused studies of unique program features. In addition, several projects from all three cohorts were selected for an outcome study.

Evaluation Sponsorship

The evaluation team also dealt with changes in perspectives among the sponsor and various DOL stakeholders along with shifts in project responsibility between internal DOL departments. Initially, the evaluation was managed by the Office of Policy Development Evaluation and Research (OPDER), which was the office that handled demonstration duties. For Round III, stewardship of the evaluation changed as the Office of Youth Services (OYS), which was primarily a programmatic office, took responsibility. With the shifts in responsibility, there were dissimilarities in the motives of the program office (OYS) and the demonstration office (OPDER) for conducting the evaluation. DOL Program’s intent was to evaluate program outcomes, whereas DOL Demonstration sought to use the evaluation for implementation and project improvement. OPDER was essentially interested in identifying promising practices, as well as building knowledge. In contrast, OYS was less interested in knowledge building, and focused on getting programs implemented and operating at a point where they could sustain themselves in the future. The shifts among these driving forces resulted in significant changes in the evaluation design during the rounds of the evaluation.

The data

Round I

In Round I the evaluation’s major purpose was to study each project’s implementation process and assess its effectiveness in building upon existing programs and systems to reach its goals. There were three broad objectives guiding the evaluation: (a) provide feedback about the extent that program activities were being carried out as planned, (b) assess the extent to which program participants carried out their roles, and (c) provide a record of the program that was actually implemented and how it may have differed from what was intended. Essentially, the evaluators were tasked with identifying promising practices within the projects. Evidence used to identify and evaluate these practices included both qualitative and quantitative data. The qualitative data provided contextual information that was particularly valuable for identifying where linkages failed between organizations and when program components were poorly implemented or not implemented at all. Among the quantitative data collected were demographic information and outcome data such as the number of job placements. These types of information provided measures of how well programs were achieving their objectives and goals.

During data collection the evaluation team members made three 2-day visits to each of the 11 projects over an 18-month period. The evaluators collected data about significant changes in project plans, contextual changes, and unexpected consequences resulting from the project. Data was also collected to document barriers, challenges, and successes for each project. An important part of the analysis included identifying lessons learned during the course of implementation, with special attention given to lessons with broader implications for national policy. The findings from the evaluation were primarily used to inform DOL of the progress that the projects were making toward implementation. The evaluation team was not allowed to share information with either the projects or the technical team. DOL also used the evaluation’s results to develop the Public Management Model for State and Local Workforce Agencies (PMM), which became the implementation model required for subsequent projects.

Round II

Framed by the need to assess the implementation of each project, the main objectives of the Round II evaluation were the same as in Round I. Data collection drew upon several methodologies, including quantitative and qualitative approaches proposed by Wholey, Hatry, and Newcomer (1994) and the use of performance monitoring systems in government programs (Swiss, 1991). The evaluation team drew upon an array of data sources, including observations, unstructured interviews, systems analysis, document reviews, data file extraction, and information exchange with the technical assistance team. As in Round I, the evaluators collected data related to significant changes in project plans, contextual changes, and unexpected consequences within the nine project sites. Data was also collected to document barriers, challenges, and successes that had surfaced within the projects.

One of the key factors that shaped data collection in Round II was the DOL’s development and adoption of the PMM. From the model, which was based on the work of Richard Nathan (1988) in the area of systems change, DOL hypothesized that projects that provided an array of workforce and re-entry services tailored to the needs of youth and that demonstrated good management practices (including data collection and analysis) would develop a continuous improvement loop. In addition to providing a good indicator of the progress that the projects were making, the PMM also guided data collection and analysis as it focused attention on the organizational and systems dimensions of each project’s implementation.

Round III

The main intent of the evaluation in Round III was to determine what had been learned about how to best help youth offenders and youth at risk of court involvement break the cycle of crime and juvenile delinquency. Due to constraints in cost and time, the evaluation team was unable to adopt a formative approach. Instead, the team used a case study approach to examine a purposive sample selected from 29 projects, which consisted of eight Round III sites selected as intrinsic cases because the projects were of particular interest and a total of six Round II and Round III sites selected as instrumental cases due to interest in a particular issue, such as how the projects used an employment bonus to retain youth in the project, or a unique component of each project. Data collection was primarily conducted during site visits lasting 8 to 10 days at each of the selected sites. While preparing the case studies, the evaluation team collected data to describe, explain, and explore the dynamics between youths and their families, the projects, and the community. Specifically, the studies considered how the dynamics affected the likelihood that youths would receive appropriate workforce development, re-entry and supportive services; prepare for employment; and avoid further involvement with the justice system. For each evaluation question and subquestion, the evaluation team identified measures, indicators, outputs, outcomes, and dimensions that represented a range of quantitative and qualitative data. Across the case studies, the data sources and collection strategies included data collected through:

1.      Direct observations of project advisory board meetings (especially efforts to revise plans, and program operations)

2.      Unstructured one-on-one and group interviews and semi-structured one-one-one and group interviews with program managers and front-line staff, youth, parents, and community representatives during visits to project sites

3.      Systems analysis of systems that supported or affected project development and implementation such as community-based organizations, schools, courts, employment and training programs

4.      Document reviews of artifacts such as project proposals, planning session documents, needs/strengths assessments, strategic and implementation plans, case files, self-assessments, records of court involvement by youths

5.      Collection of data from listservs created for discussions among project participants, including staff, partners and youth participants

6.      Review of management information system records, including abstractions of data from project records and standardized reports about the outcomes for members of the target population

7.      Review of program documentation including individual project progress reports and other relevant area research reports and findings

8.      Exchange of information with the technical assistance team and the DOL staff

9.      Telephone calls, email and other correspondence with program staff at each site

Not all of the data collection strategies applied to every case study. The specific strategies used in each case study varied based upon the nature of the evaluation questions, as well as whether the case studies were either intrinsic or instrumental. For the intrinsic case studies, evaluators attempted to answer all the evaluation questions; for instrumental case studies, only a narrow range of evaluation questions were explored during fieldwork and subsequent follow-up. A fundamental requirement for the data collection strategies in both types of case studies was the need to establish credibility and rapport with the projects’ staff, the administrators, the youth and their families, and the community or neighborhoods within which the projects operated.

Working the data

Round I

In Round I the CIPP model (Stufflebeam & Shinkfield, 1985) played a major role in several aspects of the evaluation, including data analysis. Using this systems-flow approach allowed evaluators to identify the context, inputs, process, and product of each project and track the temporal flow of each project through its developmental stages from design to implementation. By integrating these components into the analysis process, the team was able to strengthen quality and rigor in the evaluation.

Round II

In analyzing the data collected in Round II, the evaluation team took into account the considerable variability among the projects. For example, some grantees were justice agencies, some were workforce agencies, and others were community-based organizations. Grantees were also states, counties, municipalities or nongovernmental organizations. Some of the projects’ target areas were counties; others were cities or neighborhoods within a city. Data analysis was a 3-stage process based on the work of Rossi and Freeman (1993), which entailed (a) providing a full and accurate description of the actual project, (b) comparing the implementation of the demonstration across sites so that the evaluators could better understand the basis of differences they observed, and (c) asking whether the project, as implemented, conformed to its project design. An important part of the analysis included identifying lessons learned during the course of implementation, with special attention given to lessons that had broader implications for national policy. Equally important was the evaluation teams’s task to determine how closely the projects adhered to the Public Management Model that was developed as a result of the lessons learned during the Round I evaluation.

Round III

For the instrumental case studies conducted in Round III, each project was assigned a single evaluation specialist; for the intrinsic case studies, two-member field evaluation teams were assigned to each site. The basis of the team member assignments was whether the member’s area of expertise was rooted in organizational/process research techniques or ethnographic research techniques. Each team member’s responsibilities were defined by the research questions, in that one team member focused primarily on organizational and process events that occurred during the project’s planning and implementation phases, and the other team member focused primarily on youths and families involved with the project. The team member who focused on processes and events paid special attention to relationships between the project and other agencies and community institutions, as well as plans to sustain the project after the end of grant funding. This evaluator also focused on finding answers to questions that primarily required description and explanation of organizational processes and the project’s interactions with other agencies, institutions, and youths and their families. Hence, the evaluator sought answers to the evaluation questions involving what, how, and why interactions occurred. The other member of the field evaluation team sought to explain the dynamics and interactions of youth and their families with the projects, agencies, and institutions of the community. A key goal for this team member was to better understand and explain why and how beliefs and behaviors of youths and others within their community affected the operation of the demonstration project. Relying heavily upon the team member’s expertise and experience as well as theoretical propositions, this evaluator used ethnographic tools to make sense out of what he or she observed.


During the entire evaluation, triangulation was especially useful in identifying and reconciling discrepancies and inconsistencies among the data. To strengthen investigator triangulation, the evaluation team used a 2-stage review process to enhance intercoder reliability verification. Team members submitted their coded NVivo project files to the research firm headquarters where a staff member conducted a second coding pass. The NVivo project was then reviewed by research firm project administrators. To further enhance dependability and confirmability, an analysis oversight committee was included in the review process. The committee held quarterly meetings to review team feedback, data analysis procedures, and coding discrepancies, and to approve modifications to the emergent evaluation design.


Another aspect of the evaluation design that strengthened quality and rigor was the use of the memorandums written at the end of each day’s observations and interviews during site visits. The reports became a data source that enhanced analysis and interpretation of data. When the researchers left the field, they began final coding of text data and organizing the data into more precise conceptual categories to support their analysis in consultation with the research firm office staff. In addition, the evaluation team was briefed on administrative requirements, as well as site visit protocols and methods to enhance interrater reliability and uniformity of evaluation approaches.

Qualitative Data Analysis Software

Unlike during the first two rounds of the YODP, evaluators during Round III used NVivo qualitative data analysis software to aid them in collection, management and analysis. The evaluation team’s use of NVivo encouraged and supported the evaluators’ immersion in complex sets of data, fine coding, and transparency of interpretation. Specifically, several of the evaluators concluded that NVivo illuminated the analysis process among members of the evaluation team engaged in data analysis and was particularly useful for strengthening intercoder reliability. For example, members of the evaluation teams were able to review each others’ work to determine how each person was defining issues. Use of the software also allowed the evaluators to conduct reviews more effectively and finely than by using manual analysis. The evaluators used free nodes to preserve the integrity of inductive flow and the quality of the qualitative theoretical design while maintaining structured, controlled tree nodes. Rather than directly changing the code tree, evaluators could propose changes to the tree through a controlled process.

Analysis Processes

Round I

In Round I the evaluation team adopted a process evaluation approach to make “use of empirical data to assess the delivery of programs” (Scheirer, 1994, p. 40) and examine the implementation of the projects. Considering environmental factors identified in evaluation literature (Rossi & Freeman, 1989), the evaluation team assessed the extent to which grantees built strong linkages among existing organizations and developed integrated services. With such linkages, the projects were expected to deliver the services to the target population effectively and efficiently. Information about the nature of the linkages established for each project was instrumental in providing DOL with accounts of the progress being made in service delivery.

The evaluation team also sought an expansive evaluator role that entailed the use of evaluation results to help a program improve (Patton, 1986). Unfortunately, the project sponsor did not allow the evaluation team to share its findings with either the projects or the technical assistance team that had the task of helping the projects be effectively implemented. In some ways, it can be argued that this approach violated the tenets of formative evaluation, which in generally is a process evaluation whose intent is to help projects improve as they are implemented as well as provide information to those who commissioned the evaluation (Patton, 1986; Carter, 1994; Sonnichsen, 1992). As a result, the evaluation team focused on reporting the roles of key actors, dimensions of the projects, relationships, and activities within the projects and identifying lessons learned from the evaluation.

Round II

In analyzing the data collected in Round II, the evaluation team took into account the considerable variability among the projects. For example, some grantees were justice agencies, some were workforce agencies, and others were community-based organizations. Grantees were also states, counties, municipalities or nongovernmental organizations. Some of the projects’ target areas were counties; others were cities or neighborhoods within a city. Data analysis was a 3-stage process based on the work of Rossi and Freeman (1993), which entailed (a) providing a full and accurate description of the actual project, (b) comparing the implementation of the demonstration across sites so that the evaluators could better understand the basis of differences they observed, and (c) asking whether the project, as implemented, conformed to its project design. An important part of the analysis included identifying lessons learned during the course of implementation, with special attention given to lessons that had broader implications for national policy.

Round III

Round III of the evaluation marked the most complicated stage of the evaluation project, due in part to the transition from the DOL demonstration branch to the DOL program office. Unlike the demonstration office, the program office was less interested in building knowledge and more interested in getting projects going. The subsequent shift in the sponsor’s motives resulted in a shift in the focus of the evaluation. Also, since the research firm won the evaluation contract but not the contract to provide technical assistance, there was immediately a tension between the two competing private firms. These issues were the most significant factors that impacted data collection and analysis of data in Round III.

Due to resource constraints at the start of the Round III evaluation, the initial development of the codebook did not involve everyone on the evaluation team. All of the evaluators did, however, receive training on the use of the codebook. Although the implication was a tremendous commitment of team member time and organizational resources, the involvement of all team members in code development was considered to be a critical factor in the evaluation. Also related to data analysis, the evaluation echoed the importance of supporting the evolution of the codebook. Throughout the evaluation, the codebook continued to change. As the researchers returned to the field for the second stage of site visits in Round III, the codebook became deductive due to heavy analysis during the initial stages of the round. This represented an important change because inductive inquiry is used in qualitative evaluation to promote open-ended inquiry and deeper understanding. Shifting to a deductive perspective more closely aligns with quantitative hypothesis testing. By adopting a predetermined fixed code structure of meanings the study design was demonstrating a potential shift in both methodology and field practices. With a multiphase multisite study, special care was needed to shift back to inductive analysis when returning to the community. This shift had to be supported and coordinated with changes to the code structure.

Validity & Transferability

Due to the nature of the case studies examined in Round III, the evaluators collected tremendous amounts of site-specific data. The sponsor’s expectation, however, was to generalize the findings across the sites. The evaluation team used Yin’s (2003) framework for validity and reliability to help ensure that the data were appropriate, meaningful, and useful for making inferences involving analytical (rather than statistical) generalization. Within Yin’s framework the quality of the research design was represented by four concepts: construct validity, internal validity, external validity, and reliability. To strengthen construct reliability, the researcher needed to ensure that he or she established operational measures correct for the concepts being studied. Specific strategies for addressing construct validity in case studies included using multiple data sources, establishing a chain of evidence, and having key informants review a draft of the case study. Internal validity, which applied to explanatory or causal studies only but not to descriptive or exploratory studies, involved establishing a causal relationship, whereby certain conditions were shown to lead to other conditions. To strengthen internal validity, Yin recommended incorporating pattern-matching, explanation-building, rival explanations, and logic models into data analysis. External validity meant establishing the domain to which a study’s findings could be generalized. Strategies for addressing external validity in case studies included the application of theory (for single-case studies) and replication logic (for multiple-case studies). Reliability entailed demonstrating that the operations of a study could be repeated with the same or similar results. Yin suggested using a case study protocol or database for strengthening reliability. An important adjunct to the quality-control process for case studies was the use quantitative data, when possible, to confirm qualitative findings and interpretations. Also important was the use of triangulation techniques such as multiple observations by team members and multiple data sources (Patton, 2002).

Reporting the project

Round I

To disseminate the findings of the evaluation in Round I, the evaluation team prepared site visit reports for internal use only by DOL staff.  Although the reports contained information aimed at formative project improvement, the evaluators were not allowed to share their reports or other information with either the project staff or the technical assistance team because the sponsor believed it was necessary to establish a neutral environment.  Although the sponsor’s intent was to strengthen objectivity, there was an adverse impact on the evaluation.  Stakeholders were isolated from data and preliminary findings, which hindered formative improvement, led to a sense of disempowerment, and weakened utilization of findings.  The lack of data exchange also served as an obstacle to validity.  Since reports were not shared with the sites, there was no opportunity for the sites to review and comment on the information in the reports.  Consequently, there were misspelled names and other errors in factual data in the reports.

Round II

The formative evaluation approach used in Round II involved sharing evaluation reports with key demonstration stakeholders to develop a feedback loop for continuous improvement.  Included in this group of stakeholders were DOL, the projects, evaluators, and technical assistance specialists.  Evaluators reviewed their evaluation findings with the project staff during subsequent evaluation visits, and the projects integrated assessment practices into their ongoing operations.  This exchange instigated a continuous improvement approach among projects as they used the evaluations as tools to improve operations.  Technical assistance visits provided evaluation staff an opportunity to review each project’s progress and examine their needs for additional technical assistance.  During the visits, technical assistance staff provided projects with a summary of their observations, including feedback and recommendations to project managers.

Round III

The documentation of data collection procedures was included in the initial evaluation design as a strategy for enhancing rigor.  The volume of work generated in practice, though, hindered the preparation of meaningful memos.  As memos were included in site data, this factor also had an impact on data collection.  During the extended site visits in Round III, the reporting procedures were modified to include submission of a separate field memo report to address what occurred during the visit and to provide a textual snapshot of the visit.  The memos were useful in extending the findings and shaping the context of data collection by including interpretive insight and reflections by the evaluator, as well as clarification on design issues.  This practice supported a more open and transparent disclosure of design methodology (Anfara, Brown & Mangione, 2002; Bogdan & Biklen, 2003).


Anfara, Jr., V.A., Brown, K.M., and Mangione, T.L. (2002, October) ‘Qualitative analysis on stage: Making the research process more public’. Educational Researcher, 31(7): 28-38.

Bogdan, R.C. and Biklen S.K. (2003) Qualitative Research for Education: An Introduction to Theories and Methods. 4th edn. Boston, MA: Allyn & Bacon.

Carter, R. (1994) ‘Maximizing the use of evaluation results’. In J. S. Wholey, H. P. Hatry, & K. E. Newcomer (eds.), Handbook of Practical Program Evaluation. San Francisco: Jossey-Bass Publishing. pp. 576–589.

Fitzpatrick, J. L., Sanders, J. R., and Worthen, B. R. (2004) Program evaluation: Alternative approaches and practical guidelines. 3rd edn. Boston: Pearson Education.

Haas, P. and Springer, J. (1998) Applied Policy Research. New York: Garland Publishing.

Miller, E. and MacGillivray, L. (2002) Youth Offender Demonstration Project process evaluation (Final Report). Washington, D.C.: U.S. Department of Labor, Employment and Training Administration.

Mueller, E. and Swartz, A. (2002) Creating Change: Pushing Workforce Systems to Help Participants Achieve Economic Stability and Mobility. Evaluation of the Annie E. Casey Foundation Jobs Initiative. Cambridge: ABT Associates and New School University.

Nathan, R.P. (1988) Social science in government: use and misuses. Basic Books, New York

Patton, M. (1986) Utilization-Focused Evaluation. 2nd edn. Beverly Hills: Sage Publications.

Patton, M. (1997) Utilization-Focused Evaluation. 3rd edn. Thousand Oaks, CA: Sage Publications.

Patton, M. Q. (2002) Qualitative Research & Evaluation Methods. 3rd edn. Thousand Oaks, CA: Sage Publications.

Rossi, P. and Freeman, H. (1989) Evaluation: A Systematic Approach. 4th edn. Newbury Park, CA: Sage Publications.

Scheirer, M. A. (1994) ‘Designing and using process evaluation’. In J. S. Wholey, H. P. Hatry, & K. E. Newcomer (eds.). Handbook of Practical Program Evaluation. San Francisco: Jossey-Bass Publishing. pp. 40–68.

Scriven, M. (1991) ‘Beyond formative and summative evaluation’. In M. W. McLaughlin & D. C. Phillips (eds.). Evaluation and Education: At Quarter Century. Chicago: University of Chicago Press. pp. 16 – 64.

Scriven, M. (1996) ‘Types of evaluation and types of evaluator’. Evaluation Practice, 17(2): 151 – 161.

Sonnichsen, R. (1994) ‘Evaluators as change agents’. In J. S. Wholey, H. P. Hatry, & K. E. Newcomer (eds.), Handbook of Practical Program Evaluation. San Francisco: Jossey-Bass Publishing. pp. 534–548.

Stake, R. (2000) ‘Case studies’. In N. K. Denzin & Y.S. Lincoln (eds.), Handbook of Qualitative Research. 2nd edn. Thousand Oaks, CA: Sage Publications. pp. 435 – 454.

Stufflebeam, D. and Shinkfield, A. (1985) Systematic Evaluation. Hingham, MA: Klumer Academic Publishers.

Swiss, J. (1991) Public Management Systems: Monitoring and Managing Government Performance. Englewood Cliffs, NJ: Prentice Hall.

Wholey, J. (1994) ‘Assessing the feasibility and likely usefulness of evaluation’. In J. S. Wholey, H. P. Hatry, & K. E. Newcomer (eds.), Handbook of Practical Program Evaluation. San Francisco: Jossey-Bass Publishing.

Yin, R. (1993) Applications of Case Study Research. Thousand Oaks, CA: Sage Publications.

Yin, R. (2003) Case Study Research Design and Methods. 3rd edn. Thousand Oaks, CA: Sage Publications.

Looking back

Working in teams

This online chapter about the evaluation of a national youth offender program was first published in the 2nd edition of Handling Qualitative Data.  In this 3rd edition we would like to highlight some strategies we developed that researchers and evaluators will find helpful when engaged in team studies.  Our study involved over 20 evaluators working in 29 communities for more than six years.  This multidisciplinary team consisted of evaluators who had a diverse array of expertise, including ethnographic inquiry, statistical analysis, qualitative analysis, organizational development, and youth development. 

While some members of the team worked in the field conducting site visits and gathering data, other team members worked on data management and project management tasks from the Research and Evaluation Associates (REA) office.  The study consisted of three rounds of data gathering, analysis and reporting.  For example, in Round III of the evaluation, two-person evaluator teams conducted site visits at each of eight project sites scattered around the country.  At the end of each day the team members were on site, both evaluators prepared memorandums describing project operations and activities including the evaluators’ impressions and interpretations of what they had observed during the site visits.  This step included coding with the use of a predetermined codebook.  Coded data and memorandums were then sent to the REA office daily for a second coding pass by other evaluators.  The evaluators at the office also provided feedback to the team in the field.  The master project file (in NVivo) was then reviewed by REA project administrators for a second stage of review and quality control.

Over the course of the evaluation, the evaluation team learned the value of using NVivo software to facilitate the team’s work and the importance of involving all levels of the organization in the ongoing development and maintenance of a predetermined codebook.

Use of NVivo® Software

To aid in the continuous data collection and analysis process, REA purchased NVivo® qualitative analysis software for members of the evaluation team to install onto laptop computers that were used in the field.  Prior to entering the field, evaluation team members received a hands-on orientation to the software and received a codebook to use as a guide for categorizing and coding the data they collected.  The team members had also attended a Round III preparatory training meeting in January 2004 that oriented them to the procedures and research processes for site visits.  This session included a half-day of instruction on the use of NVivo to enhance analysis and included a coding exercise to give team members applied practice using the coding and search functions.  In retrospect this was an insufficient amount time for team training.  A two-day training session on NVivo prior to field site visits would have allowed the team members, especially those who lacked experience using computer assisted qualitative data analysis software, to better understand how NVivo worked and also to be better oriented to the coding scheme.  In addition to becoming competent with the technical skills required for the use of data analysis software team members required more time working with a predetermined codebook. 

Aside from these issues, the evaluation team’s use of NVivo qualitative data analysis software not only supported the evaluators’ immersion in complex sets of data but also facilitated the team’s collaborative efforts.  For example, members of the on-site evaluation teams were able to review each others’ work to determine how each person was defining issues.  Rather than directly changing the codebook, evaluators could propose changes to the book through a controlled process.  This strategy enabled the evaluators to be inductive in their coding while preserving the integrity of the master coding schema.

Involving all team members in code development

Although all of the evaluators received training on the use of the codebook, the initial development of the codebook did not involve everyone on the evaluation team due to resource constraints.  Several team members with experience using SPSS and SAS quantitative software, but not qualitative software, were responsible for preparing a codebook that the evaluators would use during the site visits.  While the coding scheme developed by the design team appeared both logical and promising (and was conceptually clear to the design team), evaluators in the field found it difficult to use.  The evaluators complained, for example, that because the codes had not been fully operationalized, the field teams had to contend with ambiguity when coding.  The evaluators also voiced concern about redundancies in the nodes, which were not discretely defined.  Perhaps most important was the evaluators’ opinion that the coding structure had become too deductive and that they were being forced to pigeonhole their data into the nodes.  Consequently, coding done by evaluators in the field was often cursory and of limited use for report writing and analysis.  Most evaluators instead preferred to prepare their site visit reports by coding by hand, rather than using NVivo.

Throughout the evaluation, the codebook continued to change.  When the design team created a more structured coding system, however, the revised coding structure proved to be too expansive for initial coding.  Analysis team members, like the field evaluators, found that having to deal with so many nodes was cumbersome and that the approach was too deductive, especially considering that they had to code interviews from eight projects, some of which they were only marginally familiar with.  For example, the team members found it too difficult to remember the discrete characteristics of the nodes and to code text uniformly.  The analysis team subsequently reduced the number of codes to make analysis more free flowing and more manageable, which allowed a more inductive coding approach to emerge.  The team also recognized that the evaluators, who conducted field visits to the eight sites, would need to be reintegrated into the coding and analysis process to capture the non-textual information essential to the analysis for the final report.  To accomplish this task, the design team implemented a revised coding scheme that allowed flexibility for evaluators in coding in the field but provided enough structure that the analysis team was confident there would be strong inter-coder reliability. 

Rules of conduct working in teams

Rules of conduct working in teams are ideally established as the team is formed.  Early development improves the effectiveness of team dynamics.  The following suggestions are based on our experiences in this study and offer you a starting point to build your own rules of conduct for your team.

  1. Establish formal rules early during the design and development stage of a project.  Carefully discuss procedures and meanings of each team rule to promote consistent adoption.
  2. As a study gets underway ongoing design choices must be made.  Procedures must be established for the uniform adoption of flexible emergent design issues.  This emergence is magnified when working in teams and when working with a predetermined codebook.
  3. One person should be assigned as the codebook master contact.  An oversight team may assist the codebook master with periodic review and revisions to the codebook.
    1. How much context should be coded?
    2. Can individual team members make new open codes?
    3. Establish clear definitions for each code?
    4. How to change a code meaning?
    5. Procedures to merge or delete codes?
  4. Professional development and training in research and evaluation methods and data analysis technology should be ongoing throughout a team study.  

Telling the Story to Relevant Audiences:  Looking Back at End of Project Challenges

Dan Kaczynski, Professor Emeritus Central Michigan University and Senior Research Fellow with the Institute for Mixed Methods Research

How and when do you know you have reached your audience?  What can be done if the delivery of your story is hindered?  Good questions to consider especially when an evaluation project is intended to inform and shape community and national social policy. 

The Youth Offender Demonstration Project (YODP) was a major multiyear (1999-2005), three phase, federal initiative by the U.S. Department of Labor and the U.S. Department of Justice.  Federal funding exceeded $32 million over a six year period.  The longitudinal evaluation portion of the YODP was tasked with both formative and summative reporting during all three phases of YODP funding. 

Looking back and reflecting upon an evaluation project which ended 15 years ago offers a unique perspective to assessing the lasting impact of the evaluation process.  This travel back in time also offers an opportunity to shed some insights into identifying meaningful lessons to improve future evaluation practices. 

In 2005 the evaluation teams were wrapping up and preparing end of project reports to the federal sponsoring agency.  As an external consultant on one of the evaluation teams I recall noticing considerable frustration among the evaluation teams regarding sponsor input into the reporting process.   Evaluators who were writing the final report indicated that the federal agency sponsors sought to suppress portions of the third and final phase of findings and recommendations.  The implication was that the federal congressional funding sponsors in power from 2001 through 2009 sought to alter and align multiyear YODP national findings to better support the prevailing political message of the day. 

Was the reporting process compromised by the suppression of phase three findings?  15 years on, a literature review of publications on “youth offender demonstration project” provided me with rather mixed results.  Online document repositories from online publishers, university online holdings, ERIC, and federal agency publication databases contain an incomplete collection of evaluation reports.  Of particular interest in my search is the Youth Offender Demonstration Project Evaluation Final Report – Volume One (June, 2006) which is publically available online.  This volume one report provides an overview of the phase three evaluation and directs the reader to volumes two through four of the final report for a complete analysis of evaluation project results.  Regrettably, I was unable to locate volumes two through four in any of the online document repositories.  This would suggest that questions regarding full public access to the final reports of the complete evaluation project remain.

What can we learn from lingering questions about the reporting process?  The credibility of research and evaluation practices are dependent upon transparent trustworthy reporting methods.  When looking back I am less drawn to questioning whether or not community and national policy makers were effectively reached but rather was the story credibly shared in its entirety.  As social researchers and program evaluators we must continually challenge ourselves regarding which voices emerge from the evaluation reporting process and are used to shape and inform policy.  Often the voices of the socially disempowered or the politically disavowed are silent.  Then and now, it remains our challenge and responsibility to promote reporting methods which uphold credible standards of practice.  Such standards should shape our present work and guide our future professional practices. 

A national organizations which is specifically relevant to this discussion is the American Evaluation Association (AEA).  The AEA guiding principles (2018) provide two particularly pertinent principles to this discussion; “C5) Accurately and transparently represent evaluation procedures, data, and findings”, and “E4) Promote transparency and active sharing of data and findings with the goal of equitable access to information in forms that respect people and honor promises of confidentiality”.  15 years ago my understanding was that strenuous efforts were made by representatives of the evaluation project to the federal agency staff sponsors seeking to accurately and transparently represent the final evaluation story in its entirety.  We must carefully consider how we can not only hold to these principles in our own professional practices but also consider how we can bring these principles into action when our work is externally controlled by dominant stakeholders. 

Author profile: Dan Kaczynski

Dan Kaczynski is a Professor at Central Michigan University in the Department of Educational Leadership where he teaches graduate research. In addition, he is actively engaged in designing and conducting state, national, and international evaluations. Dan was recently a visiting professor in Australia working closely with several universities on doctoral research and supervision. His publications and presentations have given particular attention to exploring innovations in qualitative data analysis software and online instructional delivery to promote more rigorous doctoral research.

Contact: Professor Dan Kaczynski
Central Michigan University
Department of Educational Leadership
338 EHS Building
195 East Ojibway Drive
Mt. Pleasant, Michigan 48859 

Dan Kaczynski photo

Author profile: Ed Miller

Ed Miller earned a Ph.D. in policy sciences at the University of Maryland, Baltimore County. Ed was a Senior Research Analyst at Research and Evaluation Associates, Inc. in Chapel Hill, North Carolina, as co-project manager for the nationwide evaluation of the youth offender demonstration project sponsored by the U.S. Department of Labor. In February 2014 he joined Booz Allen Hamilton, a U.S. management consulting firm, as an associate after spending the past seven years on active duty with the Army at the Pentagon and Fort Bragg, NC.   

Ed Miller

Author profile: Melissa Kelly

Melissa Kelly, EdS is a doctoral student at the University of Illinois at Chicago and a graduate research assistant in the Educational Psychology department. She also works as an instructional designer developing curriculum in higher education. Melissa's research interests include optimizing cognition and instruction in online contexts, evaluating stress and coping processes and interventions, and investigating the teaching and application of research designs.

Melissa Kelly