Author: Helen Marshall
The ‘Complaints’ project, run in 2004 by Sara Charlesworth, was a census of complaints to the Equal Opportunity Commission Victoria, Australia. The original report from the project, made to the EOCV, concerned only the demographic characteristics of the complaints investigated and was based on data collected using SPSS. Sara had always intended to use the complaints material as part of her ongoing work in understanding the nature of discrimination in contemporary Australia, so she also set up a project using NVivo and collected qualitative material to help her understand the ‘discourses of discrimination’. She had to put this project on hold for some time, and when she returned to the project her research question had changed. She enlisted my help to re-structure the project. Working together, began to understand how NVivo enhances the possibilities for iterative interrogation and re-interrogation of one’s data.
Setting up the project
We tend to think of setting up qualitative research projects as a matter of getting access to, and permission from, the people in whom we are interested. Sara’s interest was in the cases recorded in files, and what the records could tell her about how sex discrimination was understood. She did not need permission to make direct contact with people, but this did not mean it was easy to set the project up!
The Victorian Equal Opportunity and Human Rights Commission (in 2004 when the research was carried out it was the Victorian Equal Opportunities Commission) administers the state’s Equal opportunities act, the Racial and Religious tolerance act and the 2006 Victorian Charter of human rights and responsibilities act. It is an independent statutory body whose roles are to resolve complaints through conciliation, to provide education on equal opportunities and to monitor and advise on EO performance. The project analysed all sex discrimination complaints lodged with the commission in the first three months of 2004. It was governed by four sets of legislation.
- First, material in the commission’s files is subject to the secrecy provisions of the 1995 Equal Opportunities act (Victoria). Section 192 permits disclosure of information where it is necessary to the Commission in performing one of its functions, but generally prohibits disclosure of information. The Commission has tended to interpret this very strictly. Sara signed a confidentiality agreement with the Commission before undertaking the research.
- Second, the state of Victoria has privacy legislation protecting the personal information from which an individual’s identity could potentially be determined. The information Privacy Act 2000 (Victoria) applies to universities and TAFE colleges as well as to government organisations and private organisations contracted to government.
- This privacy legislation also affects the human ethics requirements of RMIT, where Sara works. At that university (as at many others) researchers handling records collected for purposes other than their research need to complete a special Privacy Module form as part of the process of applying for ethics approval. The general principle is that the research is approved as long as the public interest outweighs privacy considerations, and appropriate precautions have been taken for security of data etc.
- Fourth, data in Equal opportunity Commission files is actually the property of the Victorian Department of Justice which services the Commission. DOJ requires that any research involving the use of or access to information held by the Department of Justice be approved by its own Research Ethic Committee.
So setting up the project involved gaining ethics approval from two separate bodies and negotiating an agreement with the Commission. Basically Sara undertook that she would not publish her research in such a way that any individual could be identified, which (as noted in Reporting) had some particular consequences for the reporting of data. While gaining permission to carry out the research depended on this undertaking, none of the organisations with whom she dealt asked to scrutinise the reports before publication.
The data in the NVivo project sometimes offered a detailed picture of a complaint, and sometimes allowed only a glimpse of the matter. It consisted of summaries and partial transcripts made by Sara of all complaints received by the Office of Equal Opportunities Victoria (EOCV), over a three month period.
Material on the type of complaint, the outcome, the gender of the parties, and very general characteristics of the workplace, was recorded by the EOCV as a matter of course in 2004. Other characteristics of the parties and complaint matter were collected from the files. The amount of detail available for a particular complaint varied. Sara used SPSS to record the 27 demographic variables in the Complaints project.
Some of these demographic variables were put into the NVivo casebook (a table where the researcher can store material on the characteristics of the cases) by Sara when she set up the project. Others were imported as the re-analysis began. They came in from SPSS via Excel. Sara saved a copy of her SPSS database as an Excel workbook and emailed it me (I did not have a copy of SPSS on my computer). I used the process that Pat Bazeley suggests (Bazeley 2007 pp140-2), to import the extra variables. And I discovered that Pat’s advice about the importance of making sure that the names of cases are absolutely consistent between your original data base and NVivo is crucial. (You can get the same good advice from Lyn Richard’s tutorials on using NVivo, from the NVivo help files, and from any trainer. I had removed or renamed some sources in the project during the tidying up but forgot to have the same names for sources in the Excel workbook, and it took several hours’ work and a lot of swearing to discover and fix the mess! The lesson I learned here is to take it slowly when preparing any material to import. I’d already learned that it is easier to use Word than NVivo for formatting or altering sources (e.g. when substituting pseudonyms for real names in transcripts). From now on, I’ll take the same care with all items that I might want to bring in. The quantitative data were easy to record and comprehend (once I heeded the advice on consistency).
The qualitative material that is the core of the complaints project is much more complex. First, the sources are not ‘raw’ in the same way as encoded interview transcripts. As noted in the comments on setting up the project, the material in EOCV files is highly confidential. Sara could not simply take copies of the material; she had to work at the EOCV offices, and return the files as she read them. So the sources in the NVivo project are Sara’s short quotes and paraphrases of the documents in each file.
With limited time for the initial project, Sara found it easier to ‘talk’ the data than type it. She used Dragon Dictate voice recognition software that she had already trained to transcribe her vocal notes. This worked well. While there are certainly still some computer-generated mistakes in the sources, they seem to be minor ones and I had no difficulty understanding the sources.
Second, within every complaint that Sara’s literal voice has turned into a source, there are the metaphorical ‘voices’ of at least three actors - the complainant, the respondent and the commission. Sometimes the ‘voice’ of the complainant or respondent was being relayed by an advocate (such as union official or a lawyer). Sometimes the voice was responding to another voice - as when a complainant made a rejoinder to a respondent’s reaction to the original complaint. This is a project full of voices! In the description of working with data you can read about how the plethora of voices was eventually reduced to three ‘speakers’ the complainant, the respondent and the commission.
Working with data
As we set about re-structuring the NVivo project, Sara and I learned a lot about coding - especially about the two tendencies that Pat Bazeley calls ‘lumping’ and ‘splitting' and how they can be useful at different stages of a project.
Sara started the analysis using NVivo2. She worked very much from the ground up, deriving categories for coding from the data. Sometimes the categories stood alone as ‘free nodes’ – for example there is a node called ‘legal issues’ containing eight references to various aspects of law. Sometimes they were linked in ‘tree node’ hierarchies like the one called ‘C[complainant] at Fault’ which has ‘child’ nodes under it like ‘complainant is not a victim’ ‘complainant participated’ and so on. This image illustrates her node structure. Before she ran out of time to finish fully coding all the documents, her project had six free nodes and eight parent tree nodes most of which had children and grandchildren. All her nodes had little content coded at them, and when Sara came back to project, she felt a bit lost in the trees! She also had a new direction for her research question. Rather than asking broadly ‘what are the discourses of discrimination’ she was now interested in the extent to which the parties in any dispute saw it as an individual matter or as something systemic. The new question grew out of the work she had been doing while the ‘Complaints’ project lay untouched. For these reasons Sara brought me in to help restructure the NVivo project.
Software like NVivo is very useful when a research question changes. Re-structuring a filing cabinet feels more daunting than re-structuring a set of nodes in an electronic project. We decided to use NVivo7 since I had been working with that version, though what we did would have been possible with version 2. (Note though that the illustrations to this account use NVivo8 and if you are using NVivo, you probably have a much more recent version).
Lumping and coding down vs splitting and coding up
We agreed that the way to go for the new research question was not to use Sara’s fine grained, ground-up and ‘splitting ‘approach to coding, but to ‘lump’ data into broad categories We would look for material that signalled an understanding of discrimination as either in some way built into an organisation’s practices (‘systemic’ conception of discrimination). This material could be about many things, such as timetables, industrial agreements or what the company has always done. For the moment, we would not worry about distinguishing types of systemic conception. We would just code for ‘systemic’ conceptions and their direct opposite- conceptions of discrimination as something done by individuals (‘individual conception of discrimination'). We could then ‘code on’ from these two nodes to get a more detailed picture – working within a node rather than a source, but always able to click back to the source document if we needed to.
Lumping and splitting are about the fineness of distinctions between your categories. Imagine you are sorting fabric scraps. Will you lump everything red together, do you need to split the reds into ‘vermillion’ ‘scarlet’ and so on, or do you split along the lines of weight or texture? It’s clearly a matter of purpose and preference; is the pile of fabric intended for some artwork or for the cleaning rag bag? There is nothing that makes one approach inherently superior, nor does NVivo make one approach easier than another. (See Bazeley 2007 p67)
Coding ‘up’ or ‘down’ (or ‘inductive’ vs. ‘deductive’ or ‘data driven' vs. 'theory driven') is about your more general approach to coding. Are you looking at the data and asking ‘what concepts are in this’ or are you asking of ‘does this data contain instances of this particular concept?’ Again, one is not superior to the other, but of course one may work better for some purpose than another. For the complaints project, Sara’s initial thinking meant that data driven coding ‘up’ was sensible, but by the time of re-analysis, it made sense to look at the data for instances of ‘ understanding discrimination as caused by an individual’ and ‘ understanding discrimination as systemic’.
The ‘lumping’ approach was not how Sara usually worked - she is a meticulous coder who likes to work in detail - but it suited me as I usually begin with broad categories and gradually create detailed trees from them. So I got the job of checking all the sources and I coded material reflecting the individual and the systemic conceptions of discrimination. It soon became clear that I also needed to code who was speaking at various times in the project - was the ‘voice’ that of a complainant (direct or through an agent), a respondent, or was it an employee of the commission? The next section discusses how I did this.
eworking the existing project
An advantage of working with NVivo compared to working on paper was that Sara’s existing work could be kept for later reference alongside the new work. A paper filing system would need to be dismantled and restructured. In NVivo I simply created two tree nodes to distinguish the old project from the new. (The free nodes stayed as they were). In ‘Sara’s original project’ sits all Sara’s tree nodes. In ‘modified project’ I created a node ‘conceptions of discrimination’ with its two children ‘individual’ and ‘systemic’. In about three days I very lumpishly coded all the documents, so that there were over 200 references coded at each coding category (‘node’ in NVivo). Because I was reading with a very clear purpose she found the coding fairly easy, but of course there were times when a sentence or paragraph seemed ambiguous. In this case, I made annotations, and when Sara read through the text coded at the node we would talk about the ambiguities. In this way, the coding was refined.
Sara’s careful delineation in her notes of exactly who was speaking at any time meant that when we realised we needed to take into account the ‘voice’ uttering the conception the ‘who’s speaking’ nodes could be created by auto coding and merging nodes. Sara had set up her notes with headings showing who was speaking, and used them to autocode some of the sources. She called the parent node thus created ‘sections’. ‘Sections’ had many children, bearing odd but easily recognisable names like ‘Com1’ (first complainant) Res 2’ (second respondent) or ‘RresCres’ (Respondents’ Response to Complainant’s response). I checked the documents, then auto coded of all sources except the diary to create a node called ‘who’s speaking’ in the ‘modified project’ tree. This node had all the same children as Sara’s ‘sections’ node, but more coding, because it covered more sources. Then I created three nodes below ‘who’s speaking’ ‘‘complainant’ ‘respondent’ and ‘EOC’. I then merged all the nodes where the voice was that of a respondent into the appropriate node. This took about half an hour. All this meant that three days work was plenty of time to restructure the original project.
Once the new project was established, it was fairly easy to move ahead. We knew that we wanted to ask whether one ‘voice’ was more prone than the others to talk about discrimination in individual terms. Asking that question using matrix queries opened up a fascinating world of more questions. Sometimes these questions could be answered using the software’s tools for setting up and conducting a query. But sometimes, they meant that we needed to go on and ‘code on’ from the original two lumpy nodes- individual and the systemic conceptions of discrimination to give varieties of each. At times I was able to draw on Sara’s original coding to do this.
Working with a tidy looking structure derived form the lumpy coding gave us a sense of security. But keeping the complexity of the original coding meant that as we realized the complexity of the material we were able to dive back into the fine-grained original. These are images of the two projects. (Again, you could use current versions of NVivo to do the same task, but the on-screen look would differ).
As outlined in “Working with Data’, an important distinction in the early analysis was between conceptions of discrimination as caused by an individual and those who saw discrimination as ‘systemic’. The coding reports showed that we had 23 documents where there was coding for ‘individual’ conception but not for ‘systemic’, 8 with coding only for ‘systemic’, 15 with coding for neither and 40 with coding for both. Generating the reports kept crashing the project until we re-set the project size in the software to large.
We decided to run a matrix looking at the conceptions of discrimination by who’s speaking. This is when I created the new tree using auto coding and merging described in ‘Working with Data’) The fun began when we looked at the numbers and the words behind them in the matrix.
My note in the research diary (reproduced partially below complete with spelling errors) shows how answering one question raises another. It also reflects one the difficulties we encountered.
9/07/2007 1:40 PM
OK here's what the results are saying to me (conceptions and who's speaking matrix results):
Complainants made more comments suggesting and individual perception than respondents. (60 comments from complainants vs. 19 from respondents). The commission seemed as likely o see systemic as individual (38 systemic comments 37 individual). Respondents seemed more likely to make comments indicating a systemic view (28 systemic, 19 individual). I was surprised that the split between perceptions is so equal. There are 224 comments in the perceptions node. 116 are coded in the individual child node and 108 in the systemic child node. So it’s not the case that people can’t see systemic discrimination.
But maybe people in particular positions or with particular complaints see things differently? So a matrix for perceptions by type of discrimination first (because size will need to be recoded and I've left the grey bible* at work!).
Here, (conceptions by discrimtype) sexual harassment is obviously seen as individual (33 out of 57 comments are coded individual) work and family as systemic (13 out of 25 comments coded as individual) and sex discrimination in the middle (17 out of 37 comments coded as individual).
Does complainant gender make a difference? Not enormously but a bit? A slightly higher proportion of women make comments indicating an individual conception (about 55% compared to about 45% on a rough calculation) but the majority of complainants are women. This makes sense if we remember that women are much more likely to complain of sexual harassment (5/6 s.h complaints have a female complainant). While there are cases in the files of harassment by multiple respondents, sexual harassment fits an ‘individual’ conception. (See discrim type by case for exact figures) On the other hand, work/family discrimination seems to me to fit a 'systemic' conception - it's about the policies and practices not about a nasty individual action. About 3/4 w/f complainants are female. So maybe we'd expect a slightly higher proportion of systemic comments to come from female complainants? SARA IT MIGHT BE WORTH EXPLORING THIS A BIT MORE?
The ‘Grey Bible’ is Bazeley 2007. It replaced the ‘yellow bible’ (Bazeley and Richards 2000)
What did our numbers mean?
The unit of analysis in the complaints project is the individual complaint. So the numbers in the matrices and the words in the cells behind the numbers do not represent the number of people with a particular conception. They are the numbers of comments coded both as being from a particular voice and reflecting a particular conception. We had to avoid believing that ‘the respondent in case 35 has an individual conception’. As this particular penny dropped, we began ask more questions of our data – were there varieties of understanding within ‘structural’ and individual and did they relate to characteristics like whether outside lawyers became involved in the complaint? We also became interested in how the conceptions of the complainants and respondents seemed to be very similar. Our matrix for ‘conceptions by who’s speaking’ showed almost identical numbers of comments showing a ‘structural’ conception from complainants (19) and respondents (47)
My lumpy coding could now be supplemented by coding on from the two nodes under conceptions and by merging in some of Sara’s original fine –grained nodes. You can see an image of the tree structure that emerged in the ‘modified project’ below, showing how the simple lumpy coding that led to the queries then became a much more complex coding structure. You can see the details more clearly in the node report. We also began to see the importance of ‘human dignity’ to complainants, and did some more coding for this. Eventually we had 128 references from 59 sources to variations of this concept.
Reporting the project
The complaints project generated a confidential report to the commission and five publications, all written by Sara. The original report on the purely quantitative data is published as ‘Claiming Discrimination, Complaints of Sex and Gender Discrimination in Employment under the Victorian Equal Opportunity Act 1995’. The first round of qualitative analyses generated a book chapter, a conference paper and a journal article, all noted in the reference page. The re-analysis described earlier provided material for Sara’s 2007 Clare Burton lecture. These lectures commemorate the leading researcher, bureaucrat and academic Dr Clare Burton, a pioneering Australian researcher, activist and practitioner in the field of gender equity. They focus on gender equity. Sara’s lecture was entitled Understandings of sex discrimination in the workplace: Limits and possibilities the data that she reported as part of her argument concerned the ‘framings of what constitutes and does not constitute sex discrimination’ revealed in the complaints project and in a study of pregnancy discrimination (Charlesworth and Macdonald 2007).
How much detail could be reported?
Communicating with a broad audience rather than a specialised academic group meant that it was not appropriate to give much of the reasoning that led to the conclusions, so none of the queries that had so excited us during analysis were mentioned in the lecture. Instead, the sections on understanding discrimination in the workplaces, contested and shared meanings in complaints and human dignity make use of stories and quotations.
In describing how the project was set up, I noted the need to de-identify data to protect privacy. Reflecting on the reporting process, Sara describes her use of the complaints data in the Clare Burton lecture as a ‘balancing act’. She wanted the immediacy and vividness that telling the story of a complaint could bring to illustrating her arguments, so she included some verbatim quotations. She also gave complainants’ names, rather than using the file numbers that are in the NVivo project. But to ensure that the identity of complainants was totally protected, she was very careful not to use any detailed information about the setting of the complaint. Thus a (fictitious example) female complainant working in retail in the automotive industry in a Melbourne suburb might become ‘Helen, in retail’. Most of Sara’s research is carried out in workplaces, and the legislative context described in setting up the project, along with the concerns of employers means that there are times when she cannot be as specific as she might wish about the type of organisation in the study. But this does not mean that she must suppress the qualitative data that often seem on the surface to enable identification much more readily than demographic attributes.
Acknowledgement: Many thanks to Sara Charlesworth for help in reconstructing the story of her project.
Reports from the ‘Harassment Complaints’ Project
Charlesworth, S (2008) ‘Claiming Discrimination, Complaints of Sex and Gender Discrimination in Employment under the Victorian Equal Opportunity Act 1995’ School of Global Studies, Social Science & Planning Working Paper Series, No 1. RMIT University, Melbourne.
Charlesworth, S. (2006) ‘A Snapshot of Sex Discrimination in Employment: Disputes and Understandings’, in S. Charlesworth, K. Douglas, M. Fastenau and S. Cartwright (Eds), Women and Work: Current RMIT University Research 2005, RMIT Publishing, Informit E-library.
Charlesworth, S. (2005) ‘Managing Work and Family in the “Shadow” of Anti-Discrimination Law’, Law in Context, 23(1): 88–126.
Bazeley, P. (2007) Qualitative Data Analysis with NVivo London: Sage.
Bazeley, P. and Richards, L. (2000) The NVivo Qualitative Project Book. London: Sage.
Richards, L. (2005) Handling Qualitative Data. London: Sage.
Helen Marshall’s publications on NVivo
‘“Horses For Courses’”: Facilitating Postgraduate Research Students’ Choice of CAQDAS’ Contemporary Nurse 13/1 (August 2002) pp29-37.
Many years on: Research developments
Sara’s Clare Burton lecture is now available in print (Charlesworth, 2009). While she has not returned directly to the data we worked on, it continues to influence her work on discrimination.
If Sara were working today, I would advise a slightly different approach to data gathering, especially if she were still using NVivo. Since her project, working with databases has become easier, as I discuss below.
If we were doing the same work today, using the current version of NVivo (NVivo10), some aspects of Sara’s work would be easier. For one thing, NVivo10 no longer distinguishes ‘small’ from ‘large’ projects. We could handle up to 10 GB in total. So we would not have the problem of reports crashing the project that I noted earlier.
This doesn’t mean, however, that the project would never crash. If I get excited and treble or quadruple right click the mouse by mistake I can still hang a project or crash it. But recent versions of NVivo have made it a little less likely that a crash will lose much work. The layout shown in the screen shots is NVivo8. NVivo9 introduced ribbons and tabs like those in current versions of MS Word. They mean a few less clicks for some actions. There are still automatic save reminders set for every 15 minutes by default, and people like me need them. In addition, there is now a quick access toolbar with a save button. Now, if I do something to a project and know I want to keep it, I don’t have to go to the file menu, locate save and click it. I just skitter my excitable mouse up to the top left of the screen and save my work.
Another recent change would have made Sara’s initial work easier and eliminated the trouble I had with bringing together the qualitative and the quantitative data for this project. When Sara and I worked with NVivo7, as I described, demographic variables were brought into a project by creating a ‘casebook’ that one could either make within the project or import by a somewhat lengthy process. If the names of cases in the book did not match absolutely with the names in the project, one got into difficulties, as I described. NVivo now handles databases (which it calls datasets). So Sara could have used SPSS or Excel and typed into her dataset the notes that make up the qualitative data. Today, she could have used voice recognition software to create a worksheet that contained all the demographic data and the notes from files. This could be imported into NVivo as a ‘dataset’, with columns automatically recognised as ‘classifying' (demographic attributes) or as 'codable' data. Once in an NVivo project, the dataset could be autocoded to create cases, and the cases then ‘classified’. This would create a ‘classification sheet’. Codable fields in the dataset could be coded up or down, finely or lumpishly in any way the project required. Once some coding was done, queries could be run in any way we wanted.
One thing that has not changed is that, as I learned with the project, it is important when working with a database that you have it set up appropriately before you bring it into NVivo. You can get details of this from NVivo help.
Another thing that remains the same is that NVivo and all the other CAQDAS packages offer ways to manage and work with your data, but they do not ‘analyse’ it. You do the analysis using the tools!
References for this update
Charlesworth, S. (2009) ‘Understandings of Sex Discrimination in the Workplace: Limits and Possibilities’ in The Promise and the Price. Ten Years of the Clare Burton Memorial Lectures University of Technology Sydney for the Australian Technology Network Women’s Executive Development, Sydney, pp 205-234. (Sara’s project)
Author profile: Helen Marshall
I learned qualitative research through my 1986 PhD study of voluntarily childless couples. I thought that using index cards in boxes to store the coded data was much smarter than putting it in hanging files, but I wished I could move from the quotation on the card to the rest of the interview without having to get up and find the whole transcript. At about the time I was using cards, my supervisor Lyn Richards had the famous disaster (when the baby ate the quotation) that according to legend, led eventually to NVivo. Through seeing how the computer aided Lyn's research, I became interested in the potential of qualitative computing.
I made the move from index cards to NVivo2 to research what people actually do when they are coding. My interest in this topic was sparked by an email from a researcher complaining that coding made one feel 'like a zombie in front of a confuser'. I thought 'yeah, me too'. From the email discussion about coding, I learned some very useful tips. The most valuable for me is this: researchers should take time out from coding to think, and time out from thinking to let their unconscious mind solve problems. (You can find the paper about this listed in References.
I have also researched how postgraduate students make choices of qualitative software, and what examiners are looking for as they read theses that contain qualitative data.
I taught sociology and research methods in the school of Global Studies Social Science and Planning at the RMIT university until 2006 when I became an independent researcher and NVivo trainer. I am currently an associate at the Centre for Applied Research at RMIT with the unofficial title of 'coding nerd'.
Dr Helen Marshall,
Senior Associate, Centre for Applied Social Research, RMIT University,
GPO Box 2476V, Melbourne Victoria 3001, Australia