Study Number 6716 - General Lifestyle Survey, 2000-2010: Secure Access
LEGAL AGREEMENT ON CONDITION OF USE
Users should note that these datasets are subject to restrictive Secure Access conditions (see catalogue record for full details). The standard End User Licence and Special Licence Access versions of the
General Lifestyle/General Household Survey datasets are held under UK Data Archive Group Study Numbers 33090 and 33403, respectively.
DATA PROCESSING NOTES
Data Archive Processing Standards
The data were processed to the UK Data Archive's A standard. A rigorous and
comprehensive series of checks was carried out to ensure the quality of the data
and documentation.�Firstly, checks were made that the number of cases and
variables matched the depositor's records. Secondly, checks were made that all
variables had variable labels and all nominal (categorical) variables had value
labels. Where possible, either with reference to the documentation and/or in
communication with the depositor, absent labels were created. Thirdly, logical
checks were performed to ensure that nominal (categorical) variables had values
within the range defined (either by value labels or in the depositor's
documentation). Lastly, any data or documentation that breached confidentiality
rules were altered or suppressed to preserve anonymity.
All notable and/or outstanding problems discovered are detailed under the 'Data
and documentation problems' heading below.
Data and documentation problems
First edition:
The 'source of income' variables (SrcInc01-14 and SrcIncT1-T5) in the individual file for 2008 ('ghs08person_restricted.sav') has erroneous values where sources of income for person 1 had been inadvertently repeated for every member in the household. This has been corrected in the file 'ghs08person_corrected_restricted.sav'.
Useful Notes
Change to GHS methodology:
Users should note that during 2005, the GHS was in the process of changing from financial year collection (April to the following March) with a cross-sectional design to calendar year collection (January to December) with a longitudinal design. The GHS 2005 file is therefore drawn from two sources. The new design was introduced from April 2005 with a larger sample and the aforementioned calendar year cycle in the second quarter; these data are therefore available for quarters 2-4. Data for the first quarter are drawn from the GHS 2004-2005 in order to generate a file that relates to the full calendar year.
As a consequence of the change in design, a full year of interviews was carried out between April to December 2005; these cases were then re-allocated between January and December of 2006 for re-interview. From 2006, the GHS design was therefore on a calendar year basis and persons were re-interviewed for the first time. That is, only the April to December cases in 2005 are part of the longitudinal design. Further information about these changes may be found in the documentation (the GHS 2005 Overview Report).
The GHS 2006 was also affected by the 2005 change to longitudinal data collection. The 2006 dataset is the first wave where a proportion (68 per cent) of the sample are people who were also interviewed the year before.
It should be noted however that the dataset is still cross-sectional as it contains only data from 2006. The methodology for deriving the number of units consumed by respondents from their descriptions of their alcohol intake also changed. The 2006 dataset contains two sets of variables for number of units of alcohol consumed, one using the previous method of calculation and one using the revised method. This is documented in the file 2006_guidelines.pdf under "Changes to GHS questionnaire for 2006".
Further documentation and reports:
Further GLF documents and reports can be found at
General Lifestyle Survey, 2008 Report.
Change in household serial number variable:
The household serial number variable 'Hserial' has been replaced by the variable 'HholdId' in the 2008 individual and household files.
Data conversion information
From January 2003 onwards, almost all data conversions have been performed using software developed by the UK Data Archive. This enables standardisation of the conversion methods and ensures optimal data quality. In addition to its own data processing/conversion code, this software uses the SPSS and StatTransfer command processors to perform certain format translations. Although data conversion is automated, all data files are also subject to visual inspection by a member of the Archive�s Data Services team.
With some format conversions, data, and more especially internal metadata (i.e. variable labels, value labels, missing value definitions, data type information), will inevitably be lost or truncated owing to the differential limits of the proprietary formats. A UK Data Archive Data Dictionary file (generally in Rich Text Format (RTF)) is usually provided for each data file, enabling viewing and searching of the internal metadata as it existed in the originating format. These
files are called:
[data file name]_UKDA_Data_Dictionary.rtf
Important information about the data format supplied
The links below provide important information about the Archive's data
supply formats. Some of this information is specific to the
ingest
format of the data, i.e. the format in which the Archive received the data
from the depositor. The ingest format for this study was
SPSS
Please follow the appropriate link below to see information on your chosen supply (download)
format.
SPSS (*.sav)
SPSS files (*.sav files)
If SPSS was not the ingest format, this format will generally either have been created via the SPSS command processor (e.g. if the ingest format is STATA, SAS, Excel, or dBase). If the ingest format was non-delimited or fixed-width text, SPSS files will have been created using SPSS command syntax.
Issues: There is very seldom any loss of data or internal metadata when importing data files into SPSS. Any problems will have been listed above in the Data and Documentation Problems section of this file.
STATA (*.dta)
STATA (*.dta files)
If STATA was not the ingest format, STATA files will generally have been created from SPSS via the StatTransfer command processor. Importantly, StatTransfer's optimisation routine is run so that variables with SPSS write formats narrower than the data (e.g. numeric variables with 10 decimal places of data formatted to FX.2) are not rounded upon conversion to STATA because they are converted to 'doubles ' rather than floats. Discrete user missing values are copied across into STATA (as opposed to being collapsed into a single system missing code).
Issues: There are a number of data and metadata handling mismatches between SPSS
and STATA. Where any data or internal metadata has been lost or truncated, it will be logged in the study's SPSS_to_STATA_conversion RTF file.
Note that the complete internal metadata has been supplied in the UKDA Data
Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
Tab-delimited text (*.tab)
Tab-delimited text (*.tab files)
If tab-delimited text was not the ingest format, tab-delimited files will have been created from via the SPSS command processor, and also from Excel and MS Access files. When exporting from Access data tables to tab-delimited text, the potentially problematic special characters (tabs, carriage returns, line feeds, etc.) allowed by Access memo and text fields may have been removed by the Archive if necessary.
Issues: Date formats in SPSS are always exported to mm/dd/yyyy in tab-delimited text format. There may be a mismatch with the documentation on such variables. Variables that include both date and time such as dd-mm-yyyy hh:mm:ss (e.g. 18-JUN-2011 13:28:00), will lose the time information and become mm/dd/yyyy. All users of the data in tab-delimited format should consult the UK Data Archive Data Dictionary RTF file(s).
If the data was exported from MS Access, more limited 'data documenter' information is generally available in the RTF variable information files. These files may also contain SQL setup information.
MS Excel (*.xls/*.xslx)
MS Excel (*.xls/*xslx files)
If MS Excel was not the ingest format, Excel files may have been created via StatTransfer. The date and time issues noted under tab-delimited format may also apply here.
SAS (*.sas7bdat and *.sas)
SAS (*.sas7bdat and *sas files)
If SAS was not the ingest format, SAS files will usually have been created via StatTransfer or SPSS. SAS is not one of the Archive's standard supply formats, and the files are likely to have been created in response to a user request. The usual format is *.sas7bdat files plus a .sas proc formats file. Note that the complete internal metadata has been supplied in the accompanying UK Data Archive Data Dictionary file(s).
<%--
Issues: The main loss of information when converting from SPSS to SAS is
user-missing value definitions. By editing the .sas file, the user can choose
whether to collapse all user-missing values into system missing or preserve
the�value and lose the user-missing definition. To achieve the latter�the
following section of the .sas file should be removed before running it:
/* User Missing Value Specifications */
Note that the complete internal metadata has been supplied�in the UKDA Data
Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
--%>
MS Access (*.mdb/*.mdbx)
MS Access (*.mdb/*.mdbx files)
Due to substantial incompatibilities between versions of MS Access, the Archive will only make data available in MS Access format if this is the ingest format and/or the database contains important information in addition to the data tables (coding information, forms, queries, etc.).
Conversion of documentation formats
The documentation supplied with Archive studies is usually converted to Adobe Portable Document Format (PDF), with documents bookmarked to aid navigation. The vast majority of PDF files are generated from MS Word, RTF, Excel or plain text (.txt) source files, though PDF documentation for older studies in the collection may have been created from scanned paper documents. Occasionally, some documentation cannot be usefully converted to PDF (e.g. MS Excel files with wide worksheets) and this is usually supplied in the original or a more appropriate format.