![]() |
|
|
![]() |
![]() |
1. Computerized
Self-Administered Questionnaires U.S.
Bureau of the Census
2. Electronic
Forms U.K. Tariff & Statistical
Office
3. E-mail
surveyer ISTAT, Italy
4. Touchtone
Data Entry and World Wide Web U.S. Bureau
of Labor Statistics
5. USDA-NASS
Data Warehousing U.S. Department of
Agriculture
6. Common Collection
and Processing System U.S. Energy Information
Administration
1:30 - 6:00 p.m.
SESSION II: Editing, Imputation, and Analysis
7. SLICE: A
Framework for Editing and Imputation
Central Bureau of Statistics, Netherlands
8. AGGIES:
Edit & Imputation U.S. Department
of Agriculture
9. SOLAS for
missing data analysis Statistical
Solutions, United States / Ireland
10. QCDAS (Quality
Control & Data Analysis System)
Statistics Canada
11. Dead Graphs,
RIP - New Interactive EDA Graphic Techniques
U.S. Bureau of the Census
12. Graphical Editing
Analysis Query System U.S. Energy Information
Administration
13. INSIGHT: A Visualization
Framework for Survey Data BT Research
Labs, U.K.
SESSION III: Sampling and Estimation
14. Random
Digit Dialing Marheting Systems Group,
United States
15. The
Sample Planning Tool Research Triangle
Institute, United States
16. GSAM (Generalized
Sampling System) Statistics Canada
17. GES (Generalized
Estimation System) Statistics Canada
18. BASCULA
Central Bureau of Statistics, The Netherlands
19. Ag@ccess: A
Geographical Estimation System Abare,
Australia
20. WesVar
Westat, United States
1:30 - 6:00 p.m.
SESSION IV: Integrated Systems and Utility Systems
21. StEPS (Standard
Economic Processing System) U.S. Bureau
of the Census
22. Sprocet (Survey
Processing System) Statistics New Zealand
23. NASS's
Record Linkage System U.S. Department
of Agriculture
24. ACS (Automated
Cell Suppression) Sande &
Associates Inc., United States
25. X-12-ARIMA:
Time Series Modeling U.S. Bureau of
the Census
26. TRAMO - SEATS:
Time Series Modeling Gomez-Maravall, Spain
27. SEASABS: Seasonal
Analysis Australian Bureau of Statistics
For the 2002 Economic Census, our goal is to offer electronic reporting
via Internet or CSAQ to all respondents. We will achieve this goal
by designing all of the collection instruments from one source using the
Generalized Instrument Design System (GIDS). The Economic Directorite
has contracted to build GIDS. GIDS is a development tool used for the creation,
administration, and maintenance of surveys and other types of form-based
data collection activities. GIDS will allow analysts the ability
to create sophisticated electronic and paper surveys via a user-friendly
graphical user interface.
The form has been presented/demonstrated, and well received, at several domestic and European meetings, including most recently ETK ‘99, (October, Prague). The service of, which it is part, practically demonstrates techniques and/or benefits in the following areas:
The Current Employment Statistics (CES) survey, conducted by the US Bureau of Labor Statistics, is a monthly survey of about 380,000 business establishments. CES collects, analyzes, and publishes data on employment, hours, and earnings at the national, state, and area levels. CES data, widely viewed as a major economic indicator, are published monthly after only two and a half weeks of collection.
Traditionally, CES data were collected by mail. However, in the mid-1980s, we began offering automated collection methods such as Touchtone Data Entry (TDE), and more recently, World Wide Web (WWW). These methods now constitute the bulk of CES data collection.
Touchtone Data Entry is applicable in many types of surveys, and is useful for any information collection effort requesting numeric and yes/no answers. The CES implementation of TDE allows respondents to dial a toll-free 800 number and report data using the numbers on their telephone keypad. TDE uses broadcast FAX technology to send advance notice and nonresponse prompt messages to respondents.
World Wide Web data collection offers significant potential for collecting high quality data at low cost for all types of surveys. The CES application offers data entry and basic on-line integrity edits. Data security is maintained through the use of a digital ID and the Secure Socket Layer protocol. Web data collection uses e-mail for advance notices and non-response prompts.
Both TDE and WWW have led to product and customer service improvements,
such as more accurate microdata, more timely responses, simplified reporting,
and improved customer access to our survey products.
NASS believes we can reduce respondent burden, improve the quality of data collected, and maintain high response rates by maximizing the use of information that we already know from previous survey responses. Therefore, NASS has developed an integrated, easy to use, high performance Data Warehouse System. NASS utilizes Redbrick ODBC as our data store software. This system already contains over one-half billion records of survey and census data from farm operators and is providing improvements to our survey process. NASS queries the Redbrick ODBC using Brio Technologies Explorer software. Improvements in data quality and survey management are expected as the Data Warehouse is fully integrated with NASS's sampling, survey management, data collection, data analysis, and estimation systems.
Three lessons learned from our implementation are: 1) Select the dimensional or star schema design for the Data Warehouse, 2) choose a database that is optimized for very fast data loading and data access using the dimensional design, and 3) select a primary data access interface tool that users can point and click/drag and drop.
Our live demonstrations will illustrate these three critical success
factors: 1) Our easy to understand dimensional data warehouse design, 2)
the rapid ad-hoc query response times against our data warehouse, and 3)
the ease of using the access system.
In 1996, the first Joint Application Development (JAD) session was held, which brought together users and system developers in a series of structured workshops to determine how best to centralize, standardize, and streamline EIA data collection and processing. Many meetings and much coding later, the Common Collection and Processing System (CCAPS) was developed and released in late 1998.
The Common Data Collection and Processing (CCAPS) system is a repository of electronic working data maintained by the Energy Information Administration. Using CCAPS, EIA personnel collect data from EIA survey respondents and energy industry resources, integrate and analyze this data, and disseminate the resulting information in electronic and printed formats.
CCAPS was developed using Visual Basic and a SQL Server database engine. Currently 10 surveys are in productions and another 20 surveys will be incorporated within the next year.
A Master Universe Database (MUD) was also developed to track all potential EIA respondents and their affiliates. This system provides users with one place to analyze all EIA contacts and works in conjunction with CCAPS.
Data for statistical agencies are becoming more and more digitized.
Also, administrative data files are increasingly being used, either directly
for tabulation or for matching and mass imputation. In addition, budget
constraints call for more efficient data processing methods. In the course
of automating part of the data processing the SLICE project has been started.
SLICE is a general framework for modules (based on COM) that process data
for statistical purposes. Its main activities are editing / checking and
imputation. Other activities, such as weighting and variance estimation
are done in Bascula (see the paper of N.J. Nieuwenbroek et al.); in the
future, though, Bascula will also become one or more modules in SLICE.
The SLICE modules can be connected to define the route of processing. There
is also a graphical interface, where the user can define these connections
and configure the modules.
Currently, a few modules have been implemented and are being tested:
an editing prototype based on an extended version of the Fellegi-Holt paradigm
to automatically detect errors record-wise, a module to aggregate data
in order to detect/edit outliers graphically (see also the paper on MacroView
by T. de Waal et al.) and a module based on the Kosinsky algorithm to automatically
detect outliers. Data input/output can be any OLEDB source or a Blaise
database.
In the future the SLICE program will be embedded in the Blaise suite.
SOLAS 2.0 for Missing Data Analysis is a windows based software tool for data imputation and missing data exploratory analysis that provides a choice of both Multiple Imputation and Single Imputation methods.
The Single Imputation methods available in SOLAS include; Hot Decking, Regression Imputation, Group Means and, Last Value Carried Forward.
Multiple Imputation was originally proposed by Rubin in the early 1970’s as a possible solution to the problem of survey nonresponse, to address the failings of standard analyses of incomplete datasets. The idea behind Multiple Imputation is that for each missing value in a dataset, we impute several values (M) instead of just one, to represent the uncertainty about which values to impute.
In SOLAS, users have two Multiple Imputation approaches to choose from, namely: a predictive model-based approach, where the predictive information contained in a user-specified set of covariates is used to predict the missing values, or a propensity score-based approach, in which cases are grouped according to their probability of being missing (i.e. propensity score) and then an approximate Bayesian bootstrap is applied to sample observed values to impute the missing values.
This demonstration will include several examples of how SOLAS can be
used to perform multiple imputation on survey-type datasets containing
both continuous and categorical data. SOLAS is currently licensed by many
survey organisations including the National Opinion Research Center (US),
AC Nielsen, Statistics Denmark, and Statistics Finland.
The QCDAS is a generalized micro-computer based data analysis system that was developed for the specific data analysis needs of the Quality Assurance Methods Section in Statistics Canada. The system’s generalized capabilities can easily be adapted and made suitable for managing and analyzing continually updated application data sets on an on-going basis. The system was developed using Microsoft Access 97 and fully utilizes its architecture and functionality along with its Visual Basic programming language.
This generalized system is suitable for any organization that manages and analyzes data sets on a continuous time series basis. The framework of the system provides tools that can be used to customize edit specifications for any type of input data, define file record layouts, design and develop algorithms and formulae used to tabulate, estimate and analyze data over time, as well as design and produce reports, graphs and charts as required. This system can be used for applications such as longitudinal data analysis, cyclical tabulations, research analysis and other analytical studies.
The system was designed with flexibility, simplicity and user friendliness
in mind. The system utilizes generic templates and interactive screens
to help develop the customized data analysis algorithms and specify the
desired outputs.
Four key factors contribute to this "revolution" in data analysis -- and make the introduction of these EDA methods at this time a momentous opportunity. First, these graphics software packages provide analysts with the ability to generate hundreds of graphs in a matter of mere minutes -- a feat that would have taken months or weeks to do just a few years ago. These graphs can yield a large number of different insights into the data. Second, these software packages often allow sophisticated graphical methods of looking at the data and reviewing subcomponents of it. If the analysts believe the data is in error, then they can easily correct it in an interactive manner using these point-and-click tools. Third, this software is available at very low cost -- a very comprehensive student version of the SAS JMP-In PC software package is available for less than $60 (with a 500+ page statistical data analysis manual). Fourth, and most important, individuals using the software are not locked into fixed ways of looking at the data. By using the above hardware and software tools, we (see particularly DesJardins 1998) have developed new graphical forms and special techniques that greatly enhance the speed and efficiency of data editing/analysis tasks. Analysts no longer need to waste their time (and valuable subject matter expertise) trying to edit their data with fixed methods and cumbersome, boring, tabular printouts.
Further, Graphs also have the extraordinary ability to communicate across a wide area of expertise. They can thus make some sophisticated statistical concepts clear to laymen. Because of this, Statisticians can now more quickly/effectively explain to subject matter specialists the fundamental concepts behind these new graphical data analysis techniques -- and instead focus the majority of their efforts on improving our methodology.
Accordingly, The U.S. Census Bureau is entering a whole new world of
data analysis capability. Again, this is made possible by new, very
fast hardware (i.e. Pentiums and Unix Workstations) and powerful, easy
to learn/use, interactive point-and-click software (JMP and INSIGHT from
SAS Institute). Formerly, analysts had to learn the intricacies of
programming or wait for systems development efforts to produce custom software
that they needed for their data analysis tasks. Instead, in conjunction
with a quick, 40 hour, EDA course taught by Mr. DesJardins, analysts are
being taught a variety of powerful EDA techniques using an easy to learn
(basically point-and-click) software. The design of this courseware is
revolutionary in two other ways as well. First, it stresses a multivariate
analysis -- or all of the variables on the survey form -- allowing for
comparisons between hither-to-fore never compared variables in these data
sets -- aimed at gaining a real understanding of these data. Second, it
is designed for subject matter specialists who have only a moderate statistical
background -- to give them these key insights into their data.
The Sample Planning Tool uses a non-linear optimization algorithm for
computing the sample size and allocation. This algorithm computes
an allocation that minimizes the specified cost model while simultaneously
meeting or exceeding the required precision constraints. This is
accomplished by providing a point-and-click interface to assist the user
with the following steps:
Examples will include the sample allocation developed for the 1994/1995
Status of the Armed Forces Surveys and an optimal sample allocation for
commercial establishments.
The data used to generate the tables and maps come from ABARE's annual farm survey of approximately 1600 broadacre farms across Australia. Estimates of local averages are calculated using a kernel smoothing function, adapted to account for survey weights. Areas where there is insufficient sample coverage are masked to ensure that the confidentiality of individual farm dat is not breached.
Also to be demonstrated is the in-house software "Smooth Operator".
This application processes farm data into Ag@ccess data files or text files
consisting of smoothed gridded geographic information.
Estimates and standard error estimates can be calculated for totals, means, and percentages in multi-way tables. Standard error estimates can also be easily computed for complex functions of estimates, including ratios, differences of ratios, and log-odds ratios. WesVar calculates standard errors, coefficients of variation, and confidence intervals for the survey estimates you specify and calculates chi-square tests of independence for two-way tables of weighted frequencies. WesVar also computes estimated coefficients and their standard errors for linear and logistic regression models and tests the significance of subsets of linear combinations of parameters.
WesVar includes five replication options for estimating sampling errors: balanced repeated replication (BRR), three jackknife methods (JK1, JK2, and JKn), and Fay's BRR method (FAY).
WesVar operates under Windows 95, Windows 98, or Windows NT 4.0.
The StEPS software is designed to handle these post-collection activities: data editing, imputation, data review and correction, data query, estimation, variance estimation, disclosure, and analysis. Additionally, StEPS has links that support collection technologies for mailout, check-in, and data capture. All of these modules are encased in a GUI interface that walks users through the available functionality.
The demonstration of the StEPS software will show users the many features
this system has to offer. In addition to the post-collection functionality
mentioned above, members of the StEPS team will demonstrate other interactive
modules to administer surveys, enter parameters that tailor generalize
code to a specific survey, and access tools including those available through
SAS -- such as SAS INSIGHT® and SAS ASSIST®.
Sprocet is a survey application template that can be copied and modified for each survey specific processing system. The aim has been to retain the best of the survey application template features while adding, at a marginal cost, the required survey specific features.
It is a blend of making those things standard that can be made so, while
allowing specific customisation and flexibility for specific survey circumstances.
This has several competitive advantages.:
The back ends were developed to facilitate online review of matches,
possible matches, and/or nonmatches. They also allow users to update information
on the list frame with data from the record linkage system. The demonstration
will primarily focus on the back-ends of the system. Front ends are currently
under development to aid in file preparation and development of matching
parameters. These front ends include default parameter sets for files
which are matched against the frame on a routine bases. The additional
functionality gained by the integration of the front and back ends with
the AutoMatch software may be helpful to other organizations working with
record linkage projects.
X-12-ARIMA is the Census Bureau's new time series modeling and seasonal
adjustment program. It provides four types of enhancements to X-11-ARIMA:
(1) Extensive robust time series modeling and model selection capabilities
for linear regression models with ARIMA errors; (2) Alternative seasonal,
trading day, and holiday effect adjustment options, including the estimation
of effects described by user-defined regressors; (3) New diagnostics of
the quality and stability of the adjustment achieved by any set of specified
options; (4) A new user interface with features to facilitate the analysis
and adjustment of large numbers of series. X-12-ARIMA has been adopted
for the production of official adjustments by statistical offices in the
U.S., Europe, and Asia. X-12-Graph is a companion graphics program that
is written in SAS but does not require SAS knowledge of its users. It offers
many types of diagnostic graphs for time series modeling and seasonal adjustment.
Two programs will be demonstrated: Tramo ("Time Series Regression with
ARIMA Noise, Missing Observations and Outliers") and Seats ("Signal Extraction
in ARIMA Time Series").
Tramo is a program for estimation and forecasting of regression models
with possibly nonstationary (ARIMA) errors and any sequence of missing
values. The program interpolates these values, identifies and corrects
for several types of outliers, and estimates special effects such as Trading
Day and Easter and, in general, intervention variable effects. Fully automatic
model identification and outlier correction procedures are available.
Seats is a program for estimation of unobserved components in time
series following the so-called Arima-model-based method. The trend, seasonal,
irregular, and cyclical components are estimated and forecasted with signal
extraction techniques applied to ARIMA models. The standard errors
of the estimates and forecasts are obtained and the model-based structure
is exploited to answer questions of interest in short-term analysis of
the data.
The two programs are structured so as to be used together both for
in-depth analysis of a few series (as presently done at the Bank of Spain)
or for automatic routine applications to a large number of series (as presently
done at Eurostat). When used for seasonal adjustment, Tramo preadjusts
the series to be adjusted by Seats.
SEASABS creates seasonal adjustment factors for the series and gives the user an indication of the suitability of these factors. It identifies and corrects trend and seasonal breaks as well as extreme values, inserts trading day factors if necessary, chooses appropriate moving averages for the computation of trends and seasonal factors, and allows for moving holiday corrections. The history of changes to the parameters and prior factors can be viewed.
Graphs of original, seasonally adjusted, trend, seasonal/irregular, trading day/irregular, X11 outputs and facilities, such as decomposition, sensitivity analysis and the effects of variable henderson filters are available.
SEASABS provides, not only the ability to adjust, but also tools to analyse time series.