Study Number 4642 - Scottish Household Survey, 2001-2002
NEW EDITION INFORMATION
For the fourth edition, the depositor supplied five Scottish Index of Multiple Deprivation (SIMD) variables, which have been added to the main data file. The variables relate to the Scottish Executive's urban rural classification and the Scottish Index of Multiple Deprivation (SIMD) and rural/urban classification. Versions of the urban/rural variables were already on each data file using the settlement definitions that were current when the files were created. The new variables are a version of the classification using the 2003 settlement definitions, giving a consistent basis for analysis of change over time. These frozen variables (RURFROZ6 and RURFROZ8) will also be added to future data files. The SIMD variables (MD05PC15, MD05DEC and MD05QUIN) are named MD05 even though they relate to the 2004 index. They reflect revisions made in 2005 and since the original version on the data were named MD04, these new variables have been given a different name. Users can find more information at the
Scottish Executive Urban Rural Classification and
Scottish Index of Multiple Deprivation 2004 web pages.
DATA PROCESSING NOTES
Data Archive Processing Standards
The data were processed to the UK Data Archive's A* standard. This is the
Archive's highest standard, and means that an extremely rigorous and
comprehensive series of checks was carried out to ensure the quality of the data
and documentation.Firstly, checks were made that the number of cases and
variables matched the depositor's records. Secondly, checks were made that all
variables had comprehensible variable labels and all nominal (categorical)
variables had comprehensible value labels. Where possible, either with reference
to the documentation and/or in communication with the depositor, labels were
accordingly edited or created. Thirdly, logical checks were performed to ensure
that nominal (categorical) variables had values within the range defined (either
by value labels or in the depositor's documentation). Lastly, any data or
documentation that breached confidentiality rules were altered or suppressed to
preserve anonymity.
All notable and/or outstanding problems discovered are detailed under the 'Data
and documentation problems' heading below.
Data and documentation problems
Travel Diary Data:
a. Updated 2001 Travel Diary data (added Jan 2004):
Files dajn01 and dast01:
i. ENDHR (End Time (hr) of journey) - these variables includes isolated out-of-range values of '25'.
ii. OCODE/DCODE (Type of origin/destination postcode): these variables contain significant numbers of missing values.
b. 2002 Travel Diary Data (added Jan 2004):
Files dajn02 and dast02:
i. ENDHR (End Time (hr) of journey) - these variables includes isolated out-of-range values of '25' and '26'.
ii. OCODE/DCODE (Type of origin/destination postcode): these variables contain significant numbers of missing values.
Main data file:
Variables RCLASS and HCLASS both contain one out-of-range value of '8532'.
The documentation states that the data file contains 2075 variables and
30,638 cases. It actually contains 2081 variables and 30,639 cases.
Useful Notes
Amended variable labels for RE10a to RE10v were supplied by the depositor
in 2004. These variables should therefore be labelled as follows:
re10a "RE10 Blind / poor eyesight / eye problems"
re10b "RE10 Can't afford driving lessons"
re10c "RE10 Can't afford a car"
re10d "RE10 Can't afford to run a car"
re10e "RE10 Disability"
re10f "RE10 Don't have a car"
re10g "RE10 Don't have a licence"
re10h "RE10 Failed test"
re10i "RE10 Health problems"
re10j "RE10 Never learned to drive"
re10k "RE10 Never wanted to / not interested"
re10l "RE10 No road sense"
re10n "RE10 Prefer to walk"
re10o "RE10 Too much traffic"
re10p "RE10 Too nervous / lack confidence"
re10q "RE10 Too old"
re10r "RE10 Too young"
re10s "RE10 Other"
re10t "RE10 Don't need to drive"
re10u "RE10 Never got round to it / no time to learn"
re10v "RE10 Banned / lost licence"
Download Service users - Stata:
As the main data file for this study contains more than 2047 variables, the Stata version of this file is only available in Stata8 SE format. The travel diary files are much smaller, and readable in Stata6 or Stata7. They have however been included in the Stata8_SE zip file along with the main file, for ease of download.
For further information on the Scottish Household Survey, users should consult the
Scottish Household Survey web site.
Data conversion information
From January 2003 onwards, almost all data conversions have been performed
using software developed by the UKDA. This enables standardisation of the
conversion methods and ensures optimal data quality. In addition to its own data
processing/conversion code, this software uses the SPSS and Stat/Transfer
command processors to perform certain format translations. Although data
conversion is automated, all data files are also subject to visual inspection by
a UKDA data processing officer.
With some format conversions data, and more especially internal metadata (i.e.
variable labels, value labels, missing value definitions, data type
information), will inevitably be lost or truncated owing to the differential
limits of the proprietary formats.A UKDA Data Dictionary file (in rich text
format), corresponding to each data file, is usually provided for viewing and
searching the internal metadata as it existed in the originating format. These
files are called:
[data file name]_UKDA_Data_Dictionary.rtf
Important information about the data format supplied
The links below provide important information about the format in which you have
been supplied the data. Some of this information is specific to the
ingest
format of the data, that is the format in which the UKDA was supplied the
data in. The ingest format for this study was
SPSS
Please click below to find out information about the
format that you have
been supplied the data in.
SPSS (*.por)
SPSS portable (*.por files)
If SPSS portable was not the ingest format, this format will generally either
have been created via the SPSS command processor (e.g. if the ingest format is
SPSS .sav, SAS, Excel, or dBase), or if the ingest format was STATA, the SPSS
version will be created via the Stat/Transfer command processor. If the ingest
format was undelimited text, the data will have been read into SPSS using an
SPSS command file.
Issues: There is very seldom any loss of data or internal metadata when
importing data files into SPSS. Any problems will have been listed above in the
Data and Documentation Problems section of this file.
STATA (*.dta)
STATA (*.dta files)
If STATA was not the ingest format, all STATA files will have been created from
SPSS .sav format via the Stat/Transfer command processor. Importantly,
Stat/Transfer's optimisation routine is run so that variables with SPSS write
formats narrower than the data (e.g. numeric variables with 10 decimal places of
data formatted to FX.2) are not rounded upon conversion to STATA because they
are converted to 'doubles ' rather than floats. User missing values are copied
across into STATA (as opposed to being collapsed into a single system missing
code).
Issues: There are a number of data and metadata handling mismatches between SPSS
and STATA. Where any data or internal metadata has been lost or truncated, this
will have been automatically logged in this file:
4642_SPSS_to_STATA_conversion.rtf
Note that the complete internal metadata has been suppliedin the UKDA Data
Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
Tab-delimited text (*.tab)
If tab-delimited text was not the ingest format, tab-delimited fileswill have
beencreated from SPSS portable files via the SPSS command processor, and also
from Excel and MS Access files. When exporting from Access data tables to
tab-delimited text, thepotentially problematicspecial characters (tabs,
carriage returns, line feeds, etc.) allowed by Access memo and text fields are
stripped out by the UKDA.
Issues: Date formats in SPSS are always exported to mm/dd/yyyy in tab-delimited
text format - sothere be be amismatch with the documentation on such
variables. Variables that include both date and time such as dd-mm-yyyy hh:mm:ss
(e.g. 18-JUN-2001 13:28:00), will lose the time information and become
mm/dd/yyyy. If the time information is critical, a new variable will have been
created in the tab-delimited data file by the UKDA. All users of the data in
tab-delimited format should consult the UKDA Data Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
If the data was exported from MS Access, more limited 'data documenter'
information is suppiedin the file(s): [data table name]_variableinformation.rtf
These files may also contain SQL setup information.
MS Excel (*.xls files)
If MS Excel was not the ingest format, Excel fileswill havebeencreated via
the SPSS command processor. The date and time issues noted under tab-delimited
formatapply to SPSS to Excel conversion via the SPSS command processor.
SAS (supplied as *.dat and *.sas)
If SAS was not the ingest format, all SAS files will have been created from SPSS
.sav format via the Stat/Transfer command processor. The data files are provided
as a fixed-width text file (*.dat) and a SAS command file (*.sas), which when
run will create a SAS dataset. This enables the user to recreate the SAS dataset
and formats library in almost all versions of SAS and all operating systems.
Issues: The main loss of information when converting from SPSS to SAS is
user-missing value definitions. By editing the .sas file, the user can choose
whether to collapse all user-missing values into system missing or preserve
thevalue and lose the user-missing definition. To achieve the latterthe
following section of the .sas file should be removed before running it:
/* User Missing Value Specifications */
Note that the complete internal metadata has been suppliedin the UKDA Data
Dictionary file(s): [data file name]_UKDA_Data_Dictionary.rtf
MS Access (*.mdb files)
Due to the substantial incompatibilities between versions of MS Access, the UKDA
only make data available in MS Access format if this is the ingest format and
the database contains important information in addition to the data tables
(coding information, forms, queries, etc.).
Conversion of documentation formats
Electronic and paper documentation supplied with this study is usually
incorporated into the UKDA User Guide (in PDF format). The conversion programmes
used are the latest versions of Adobe PDF Writer for electronic documentation
and Adobe Paper Capture (Acrobat 'plugin' version) for paper documentation.
Occasionally, someof the electronic documentation cannot be usefully converted
to PDF (e.g. MS Excel files with wide worksheets) and this is supplied ina more
appropriate format. All User Guides are fully bookmarked.