The sixth edition of this dataset contains the new grossing regime, GROSS 3.
The data were processed to the UK Data Archive's A standard. A rigorous and comprehensive series of checks was carried out to ensure the quality of the data and documentation.Firstly, checks were made that the number of cases and variables matched the depositor's records. Secondly, checks were made that all variables had variable labels and all nominal (categorical) variables had value labels. Where possible, either with reference to the documentation and/or in communication with the depositor, absent labels were created. Thirdly, logical checks were performed to ensure that nominal (categorical) variables had values within the range defined (either by value labels or in the depositor's documentation). Lastly, any data or documentation that breached confidentiality rules were altered or suppressed to preserve anonymity.
All notable and/or outstanding problems discovered are detailed under the
'Data and documentation problems' heading below.
Data and
documentation problems
Date Variables:
In SAS format (in which the UKDA is supplied with the
data) dates are held as a number representing the number of days elapsed since
1st January 1960. The value -1 is used to donate missing data. Upon conversion
to SPSS, -1 is treated as the day before January 1st 1960, ie 31st December
1959. As a result, dates of 31.12.59 should be treated as missing
data.
Missing Values:
Missing values within the SAS files are
defined as A's, B's and C's and relate to 'Skipped, 'Refused' and 'Don't know'.
In SPSS these can only be represented as negative numbers and are, therefore,
assigned as -1, -8 and -9
Useful Notes
The variables
INDINC, BUINC and HHINC are derived variables and are composed of income from
different sources. One of the components is income from self employment
(SEINCAM2 and INCSEO2) and these can have negative values, therefore the
variables derived using these components can also be negative.
File:
transact.por
This file contains information on the imputations and edits
carried out by DWP. The information contained within it is meant to be used as
an extension of the documentation and not for analysis.
File:
pension.por
Users should be aware that the variable HDPEN has some rogue
values within it. The depositor has stated that this variable was created for
DWP use rather than for external users.
Data conversion
information
From January 2003 onwards, almost all data conversions have been performed using software developed by the UKDA. This enables standardisation of the conversion methods and ensures optimal data quality. In addition to its own data processing/conversion code, this software uses the SPSS and Stat/Transfer command processors to perform certain format translations. Although data conversion is automated, all data files are also subject to visual inspection by a UKDA data processing officer.
With some format conversions data, and more especially internal metadata
(i.e. variable labels, value labels, missing value definitions, data type
information), will inevitably be lost or truncated owing to the differential
limits of the proprietary formats.A UKDA Data Dictionary file (in rich text
format), corresponding to each data file, is usually provided for viewing and
searching the internal metadata as it existed in the originating format. These
files are called: [data file name]_UKDA_Data_Dictionary.rtf
Important
information about the data format supplied
The links below provide important information about the format in which you
have been supplied the data. Some of this information is specific to the
ingest format of the data, that is the format in which the UKDA was
supplied the data in. The ingest format for this study was SAS
Please
click below to find out information about the format that you have been
supplied the data in.
SPSS portable (*.por files)
If SPSS portable was not the ingest
format, this format will generally either have been created via the SPSS command
processor (e.g. if the ingest format is SPSS .sav, SAS, Excel, or dBase), or if
the ingest format was STATA, the SPSS version will be created via the
Stat/Transfer command processor. If the ingest format was undelimited text, the
data will have been read into SPSS using an SPSS command file.
Issues: There is very seldom any loss of data or internal metadata when importing data files into SPSS. Any problems will have been listed above in the Data and Documentation Problems section of this file.
STATA (*.dta files)
If STATA was not the ingest format, all STATA
files will have been created from SPSS .sav format via the Stat/Transfer command
processor. Importantly, Stat/Transfer's optimisation routine is run so that
variables with SPSS write formats narrower than the data (e.g. numeric variables
with 10 decimal places of data formatted to FX.2) are not rounded upon
conversion to STATA because they are converted to 'doubles ' rather than floats.
User missing values are copied across into STATA (as opposed to being collapsed
into a single system missing code).
Issues: There are a number of data and metadata handling mismatches between SPSS and STATA. Where any data or internal metadata has been lost or truncated, this will have been automatically logged in this file: 4068_SPSS_to_STATA_conversion.rtf Note that the complete internal metadata has been suppliedin the UKDA Data Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
Issues: Date formats in SPSS are always exported to mm/dd/yyyy in tab-delimited text format - sothere be be amismatch with the documentation on such variables. Variables that include both date and time such as dd-mm-yyyy hh:mm:ss (e.g. 18-JUN-2001 13:28:00), will lose the time information and become mm/dd/yyyy. If the time information is critical, a new variable will have been created in the tab-delimited data file by the UKDA. All users of the data in tab-delimited format should consult the UKDA Data Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
If the data was exported from MS Access, more limited 'data documenter' information is suppiedin the file(s): [data table name]_variableinformation.rtf These files may also contain SQL setup information.
If SAS was not the ingest format, all SAS files will have been created from SPSS .sav format via the Stat/Transfer command processor. The data files are provided as a fixed-width text file (*.dat) and a SAS command file (*.sas), which when run will create a SAS dataset. This enables the user to recreate the SAS dataset and formats library in almost all versions of SAS and all operating systems.
Issues: The main loss of information when converting from SPSS to SAS is user-missing value definitions. By editing the .sas file, the user can choose whether to collapse all user-missing values into system missing or preserve thevalue and lose the user-missing definition. To achieve the latterthe following section of the .sas file should be removed before running it:
/* User Missing Value Specifications */
Note that the complete internal metadata has been suppliedin the UKDA Data Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
Due to the substantial incompatibilities between versions of MS Access, the UKDA only make data available in MS Access format if this is the ingest format and the database contains important information in addition to the data tables (coding information, forms, queries, etc.).
Electronic and paper documentation supplied with this study is usually incorporated into the UKDA User Guide (in PDF format). The conversion programmes used are the latest versions of Adobe PDF Writer for electronic documentation and Adobe Paper Capture (Acrobat 'plugin' version) for paper documentation. Occasionally, someof the electronic documentation cannot be usefully converted to PDF (e.g. MS Excel files with wide worksheets) and this is supplied ina more appropriate format. All User Guides are fully bookmarked.