* Encoding: UTF-8.
*Start with the cleaned SPSS file for ESS6 that you produced in Chapter 7. 
*You can proceed with an 'uncleaned' ESS6 file downloaded from the ESS site, but you 
will find that there will be some very minor differences in the results you obtain with it compared 
to those in the book or companion website.

*Remember that you need to insert the 'filepath' for your computer between the inverted commas after OUTFILE=.
SAVE OUTFILE='/Users/johnmacinnes/Documents/ESS6Ch8check.sav'
 /keep = name  idno cntry  hhmmb  gndr to rshipa24 icpart1 rshpsts eisced eiscedp pdwrkp to wkhtotp eiscedf 
  eiscedm country dweight pspwght pweight infanthhn to noage parents partners offspring sibs 
  pgndr malehhn  femhhn  gndrdkr totalhhn. 

*we use the three set of variables that describe the gender, year of birth and relationship to the respondent
 of other household members to produce an alterntive count to that reorded by the original hhmmb variable.
COUNT yearn = yrbrn2 to yrbrn24 (6666, SYSMIS).
COUNT gndrn = gndr2 to gndr24 (6, SYSMIS).
COUNT rshipan = rshipa2 to rshipa24 (66, SYSMIS).
FREQ yearn to rshipan.
CORR yearn to rshipan.
COMPUTE hhmmb1 = (24-yearn).
FREQ hhmmb1.
MISSING VALUES hhmmb ().
CROSS hhmmb by hhmmb1.

*produce a crosstab of the original hhmmb version with the new hhmmb1 variable to check for dscrepancies.
* Then select discrepant cases to examine the values of the relevant variables.
USE ALL.
COMPUTE filter_$=((hhmmb1 eq 12) and (hhmmb ne 12)).
FILTER BY filter_$.
SUMMARIZE
/TABLES=idno cntry hhmmb hhmmb1 gndr to gndr24,
yrbrn to rshipa24
/FORMAT=LIST CASENUM TOTAL LIMIT=40
/CELLS=NONE.

*set the value for variables where there is no information on other hosuehold members (almost always because 
such members do not exist) to system missing values.  
USE ALL.
RECODE gndr2 to gndr24 (6 thru 9 = SYSMIS).
RECODE yrbrn2 to yrbrn24 (6666 thru 9999 = SYSMIS).
RECODE rshipa2 to rshipa24 (66 thru 99 = SYSMIS).

* Identify Duplicate Cases as defined by idno. This produces a variable PrimaryLast, describing duplicates.
*After running a frequency table on this variable we delete it, so that we can use the same procuedure again later.
SORT CASES BY idno(A).
MATCH FILES
  /FILE=*
  /BY idno
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMATS  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
DELETE VARIABLES PrimaryLast.

*Produce a new identifier variable.
COMPUTE newid = (100000000*country) + idno.

*Check that we now have a variable that takes a unique value for each cases in our dataset.
SORT CASES BY newid(A).
MATCH FILES
  /FILE=*
  /BY newid
  /FIRST=PrimaryFirst
  /LAST=PrimaryLast.
DO IF (PrimaryFirst).
COMPUTE  MatchSequence=1-PrimaryLast.
ELSE.
COMPUTE  MatchSequence=MatchSequence+1.
END IF.
LEAVE  MatchSequence.
FORMATS  MatchSequence (f7).
COMPUTE  InDupGrp=MatchSequence>0.
SORT CASES InDupGrp(D).
MATCH FILES
  /FILE=*
  /DROP=PrimaryFirst InDupGrp MatchSequence.
VARIABLE LABELS  PrimaryLast 'Indicator of each last matching case as Primary'.
VALUE LABELS  PrimaryLast 0 'Duplicate Case' 1 'Primary Case'.
VARIABLE LEVEL  PrimaryLast (ORDINAL).
FREQUENCIES VARIABLES=PrimaryLast.
DELETE VARIABLES PrimaryLast.

*Create a variable rshipa1 that describes the relationship of the repondent to themselves.
COMPUTE rshipa1 = 0.

*restructure the dataset to create new cases out of variables gndr to rshipa24. This dataset will have one case
 for each household member (including the respondent) who is reported in these variables.
VARSTOCASES
  /ID=id
  /MAKE gender FROM gndr gndr2 gndr3 gndr4 gndr5 gndr6 gndr7 gndr8 gndr9 gndr10 gndr11 gndr12 
    gndr13 gndr14 gndr15 gndr16 gndr17 gndr18 gndr19 gndr20 gndr21 gndr22 gndr23 gndr24
  /MAKE dob FROM yrbrn yrbrn2 yrbrn3 yrbrn4 yrbrn5 yrbrn6 yrbrn7 yrbrn8 yrbrn9 yrbrn10 yrbrn11 
    yrbrn12 yrbrn13 yrbrn14 yrbrn15 yrbrn16 yrbrn17 yrbrn18 yrbrn19 yrbrn20 yrbrn21 yrbrn22 yrbrn23 
    yrbrn24
  /MAKE rshipa FROM rshipa1 rshipa2 rshipa3 rshipa4 rshipa5 rshipa6 rshipa7 rshipa8 rshipa9 
    rshipa10 rshipa11 rshipa12 rshipa13 rshipa14 rshipa15 rshipa16 rshipa17 rshipa18 rshipa19 rshipa20 
    rshipa21 rshipa22 rshipa23 rshipa24
  /INDEX=Index1(24) 
  /KEEP=name idno cntry hhmmb icpart1 rshpsts eisced eiscedp pdwrkp edctnp uemplap uemplip dsbldp 
    rtrdp cmsrvp hswrkp dngothp dngdkp dngnapp dngrefp dngnap icomdnp mnactp icppdwk crpdwkp isco08p 
    emprelp wkhtotp eiscedf eiscedm dweight pspwght pweight  malehhn infanthhn childhhn 
    wagehhn oldhhn noage parents partners offspring sibs pgndr country femhhn gndrdkr totalhhn yearn 
    gndrn rshipan hhmmb1  newid
  /NULL=DROP
  /COUNT=hhmmb2.

*Run this syntax on the new dataset you have just created (NOT the original ESS6 dataset!) to create a variable describing 
all those household members who were children.
COMPUTE age = 2013-dob.
RECODE age (lo thru -1=999).
FREQ age.
COMPUTE child eq 0.
IF (age le 14) and (rshipa eq 2) child = 1.
FREQ child.

*Create a new data set that aggregates data from all the cases for each household and creates one case 
for each household in teh new dataset.
DATASET DECLARE ESS6hhdata1.
SORT CASES BY newid.
AGGREGATE
/OUTFILE='ESS6hhdata1'
/PRESORTED
/BREAK=newid
/oldesthhm 'oldest member of the household'=MAX(age)
/younghhm 'youngest hhm'=MIN(age)
/child_sum=SUM(child)
/hhmmb4=N.

*Merge this new data set with the existing ESS6 dataset.
SORT CASES by newid (A).

DATASET ACTIVATE DataSet1.
MATCH FILES /FILE=*
  /FILE='ESS6hhdata'
  /RENAME newid=newidcheck.
EXECUTE.


*download the ess1 and ess7 datasets from teh ESS website and open the dataset for ESS1.
*Save a version of the ESS1 dataset with only the variables we wish to keep. Remember that the filepath in 
your computer will be different to the one shown below.

SAVE OUTFILE='.../ESS1e06_4immvars.sav'
/KEEP =essround cntry idno ctzcntr brncntr imbleco gndr agea eisced chldhm dweight
pweight
/COMPRESSED.

* repeat this for the ESS7 dataset.
SAVE OUTFILE='/Users/ekja16/Downloads/ESS7e02.spss/ESS7e02immvars.sav'
/KEEP =essround cntry idno ctzcntr brncntr imbleco gndr agea eisced chldhm dweight
pweight
/COMPRESSED.




*open the two new datsets that you have created. You will now have four datasets open in SPSS - the 
two original datasets and the two new ones you have created with a smaller selection of variables. Close
the two original datsets so that you have only your two new datasets open. Make your new version of the ESS1  
dataset your active dataset by clicking on its data editor window. 
*Either use the GUI to create the appropriate syntax to merge the cases from both datasets for these variables, 
or use the syntax below replaceing 'XXX' with name for your new version of the ESS7 dataset.

ADD FILES /FILE=*
  /FILE='xxx'.
EXECUTE.

*create the numeric version of cntry for your new merged dataset.
AUTORECODE VARIABLES=cntry 
  /INTO country
  /PRINT.