Managing and Sharing Research Data: A Guide to Good Practice
4.2. Data Documentation
Exercise 4.2
|
Did you find the information, and where? |
||
Key information needed for reuse of data (examples) |
Malawi HH survey |
Manufacturing Bangladesh |
AITHS |
Example: Number of respondents |
340; found in the dataset description |
1395 households; found in the ReadMe file (part of the data zip bundle) |
8492; found in the AITHS Technical Report 1 |
Geographical area where the data were collected |
|
|
|
Is there sampling bias or is the sample random? |
|
|
|
Is there is a control group ? |
|
|
|
Were data collected directly in digital format or on paper and then submitted/transcribed into a database; if so was double entry or peer checking done to avoid errors? |
|
|
|
Which questions exactly were asked in the survey or interview (or which protocols used for measurements) |
|
|
|
Can you find the hypothesis or aims of the research that generated this dataset? |
|
|
|
How was consent gathered? |
|
|
|
Can the data be used for commercial purposes? |
|
|
|
What access conditions apply to the data? |
|
|
|
Can you find a publication that describes the findings of this dataset? |
|
|
|
Is it clear which respondents or interviewees are female? |
|
|
|
If there are missing data in the datafile, are they missing because the respondent did not respond or because the question was not asked to this respondent? (or missing because a measurement was not done or not relevant) |
|
|
|
Does the file format and structure of the data facilitate easy reuse? |
|
|
|
Are related datasets that use the same research protocol comparable to facilitate cross-analysis, e.g. same variable names, same coding structure, etc. |
|
|
|
Answer to Exercise 4.2
|
Did you find the information, and where? |
||
Key information needed for reuse of data (examples) |
Malawi HH survey |
Manufacturing Bangladesh |
AITHS |
Example: Number of respondents |
340; found in the dataset description |
1395 households; found in the ReadMe file (part of the data zip bundle) |
8492; found in the AITHS Technical Report 1 |
Geographical area where the data were collected |
Malawi, Ntcheu District, areas Manjawira, Nsipe, Sharpevale and Tsangano; found in dataset descriptor |
Bangladesh |
Eire and Northern Ireland |
Is there sampling bias or is the sample random? |
Random sample |
Random sample, then stratified |
Sampling bias; found in AITHS Technical Report 1 |
Is there is a control group? |
No |
Yes, 16 control villages |
No |
Were data collected directly in digital format or on paper and then submitted/transcribed into a database; if so was double entry or peer checking done to avoid errors? |
Unknown |
Unknown |
Data collected via paper questionnaire, checked and transcribed to a spreadsheet, and if |
Which questions exactly were asked in the survey or interview (or which protocols used for measurements) |
Questionnaire is available as documentation |
Unknown |
Questionnaire is available as documentation |
Can you find the hypothesis or aims of the research that generated this dataset? |
No |
Yes, in dataset abstract and in published paper |
Yes, in dataset descriptor |
How was consent gathered? |
Unknown
|
Unknown |
As part of the questionnaire form |
Can the data be used for commercial purposes? |
Not clear, seemingly CC0 licence, so yes |
No, CC-BY-NC licence; found in data descriptor |
No, research and learning purposes only; found in dataset descriptor |
What access conditions apply to the data? |
Open access |
Open access |
Data available upon request |
Can you find a publication that describes the findings of this dataset? |
No |
Yes, direct link from the dataset record in Mendeley |
Various reports included in documentation files |
Is it clear which respondents or interviewees are female? |
Yes, this can be seen in the data files |
Yes, this can be seen in the data files |
Yes; found in AITHS SUMMARY |
If there are missing data in the datafile, are they missing because the respondent did not respond or because the question was not asked to this respondent? (or missing because a measurement was not done or not relevant) |
Unknown; missing data are blank |
Unknown; proprietary data, so cannot check |
Yes, missing data information is available in the data dictionaries |
Does the file format and structure of the data facilitate easy reuse? |
Excel format
|
Stata format, not normalized |
SPSS format, not normalized |
Are related datasets that use the same research protocol comparable to facilitate cross-analysis, e.g. same variable names, same coding structure, etc. |
Yes |
N/A |
Yes |