4.2. Data Documentation

Exercise 4.2

 

Did you find the information, and where?

Key information needed for reuse of data (examples)

Malawi HH survey

Manufacturing Bangladesh

AITHS

Example: Number of respondents

340; found in the dataset description

1395 households; found in the ReadMe file (part of the data zip bundle)

8492; found in the AITHS Technical Report 1

Geographical area where the data were collected

 

 

 

Is there sampling bias or is the sample random?

 

 

 

Is there is a control group ?

 

 

 

Were data collected directly in digital format or on paper and then submitted/transcribed into a database; if so was double entry or peer checking done to avoid errors?

 

 

 

Which questions exactly were asked in the survey or interview (or which protocols used for measurements)

 

 

 

Can you find the hypothesis or aims of the research that generated this dataset?

 

 

 

How was consent gathered?

 

 

 

Can the data be used for commercial purposes?

 

 

 

What access conditions apply to the data?

 

 

 

Can you find a publication that describes the findings of this dataset?

 

 

 

Is it clear which respondents or interviewees are female?

 

 

 

If there are missing data in the datafile, are they missing because the respondent did not respond or because the question was not asked to this respondent? (or missing because a measurement was not done or not relevant)

 

 

 

Does the file format and structure of the data facilitate easy reuse?

 

 

 

Are related datasets that use the same research protocol comparable to facilitate cross-analysis, e.g. same variable names, same coding structure, etc.

 

 

 

Answer to Exercise 4.2

 

Did you find the information, and where?

Key information needed for reuse of data (examples)

Malawi HH survey

Manufacturing Bangladesh

AITHS

Example: Number of respondents

340; found in the dataset description

1395 households; found in the ReadMe file (part of the data zip bundle)

8492; found in the AITHS Technical Report 1

Geographical area where the data were collected

Malawi, Ntcheu District, areas Manjawira, Nsipe, Sharpevale and Tsangano; found in dataset descriptor

Bangladesh

Eire and Northern Ireland

Is there sampling bias or is the sample random?

 Random sample

Random sample, then stratified

Sampling bias; found in AITHS Technical Report 1

Is there is a control group?

 No

Yes, 16 control villages

 No

Were data collected directly in digital format or on paper and then submitted/transcribed into a database; if so was double entry or peer checking done to avoid errors?

Unknown

Unknown

Data collected via paper questionnaire, checked and transcribed to a spreadsheet, and if
required followed up
with the study
coordinators for
Clarification; found in
AITHS SUMMARY

Which questions exactly were asked in the survey or interview (or which protocols used for measurements)

Questionnaire is available as documentation

Unknown

Questionnaire is available as documentation

Can you find the hypothesis or aims of the research that generated this dataset?

 No

Yes, in dataset abstract and in published paper

Yes, in dataset descriptor

How was consent gathered?

 Unknown

 

Unknown

As part of the questionnaire form

Can the data be used for commercial purposes?

Not clear, seemingly CC0 licence, so yes

No, CC-BY-NC licence; found in data descriptor

No, research and learning purposes only; found in dataset descriptor

What access conditions apply to the data?

Open access

Open access

Data available upon request

Can you find a publication that describes the findings of this dataset?

No

Yes, direct link from the dataset record in Mendeley

Various reports included in documentation files

Is it clear which respondents or interviewees are female?

Yes, this can be seen in the data files

Yes, this can be seen in the data files

Yes; found in AITHS SUMMARY

If there are missing data in the datafile, are they missing because the respondent did not respond or because the question was not asked to this respondent? (or missing because a measurement was not done or not relevant)

Unknown; missing data are blank

Unknown; proprietary data, so cannot check

 Yes, missing data information is available in the data dictionaries

Does the file format and structure of the data facilitate easy reuse?

Excel format

 

Stata format, not normalized

SPSS format, not normalized

Are related datasets that use the same research protocol comparable to facilitate cross-analysis, e.g. same variable names, same coding structure, etc.

Yes

N/A

Yes