|
ICES - II |
International Conference on Establishment Surveys - II Survey Methods for Businesses, Farms, and Institutions June 17-21, 2000 The Adam's Mark Hotel Buffalo, New York |
|
|
Last modified May 26, 2000 Invited Paper Sessions ABSTRACTS
CONTENT
0. KEYNOTE
ADDRESS--ESTABLISHMENT SURVEYS SINCE ICES1: WHAT HAS AND HASN'T CHANGED,
AND WHAT ARE
THE ISSUES OF THE FUTURE?
1. IMPROVING THE QUALITY OF DATA
REPORTING IN BUSINESS SURVEYS
2. STATISTICAL GRAPHICS TRACT
3. SURVEYS OF COMMODITY FLOWS:
PROBLEMS AND IMPROVEMENTS
4. ESTIMATION STRATEGIES USING
VARIANTS OF POISSON SAMPLING
7. DESIGNING COMMODITY SURVEYS
8. SURVEYS IN ESTABLISHMENT
DATA COLLECTION
9. GENERALIZED ESTIMATION
SYSTEMS IN GOVERNMENT AGENCIES
13. STRATEGIC DIRECTIONS FOR BUSINESS
SURVEYS: AN INTERNATIONAL PERSPECTIVE
14. TREND ESTIMATION
15. IMPROVING RESPONSE RATES OF
BUSINESS SURVEYS USING INTEGRATED COMMUNICATION STRATEGIES
19. WORKPLACE AND EMPLOYMENT SURVEYS
20. PRIORITY CONTACT OF
NON-RESPONDENTS
21. CURRENT TOPICS IN SEASONAL
ADJUSTMENT
25. BUSINESS REGISTERS: MAINTENANCE
AND USES
26. THE CONTINUOUS IMPROVEMENT
APPROACH TO EDITING: HOW TO IMPROVE DATA QUALITY BY EDITING
27. HUMAN SIDE OF DATA DISSEMINATION
28. COMBINING SURVEY AND
ADMINISTRATIVE DATA
31. COORDINATING SAMPLING BETWEEN AND
WITHIN SURVEYS`
32. GENERALIZED INTEGRATED PROCESSING
SYSTEMS
33. OUTLIERS
34. TECHNOLOGY DRIVEN CHANGES IN DATA
COLLECTION AND DISSEMINATION OF ESTABLISHMENT DATA
37. PRACTICAL EDITING AND IMPUTATION
STRATEGIES FOR STATISTICAL SURVEYS
38. QUALITY IN BUSINESS SURVEYS
39. SURVEY ISSUES IN A CHANGING AGRICULTURE
INDUSTRY
43. INTEGRATING AGRICULTURE AND FOOD
STATISTICS: NATIONAL AND INTERNATIONAL PERSPECTIVES
44. NEW DEVELOPMENTS IN IMPUTATION OF
BUSINESS SURVEY DATA
45. PRINCIPLES AND PRACTICES IN THE
MEASUREMENT OF THE UNRECORDED ECONOMY
49. STATISTICAL DISCLOSURE CONTROL FOR
ESTABLISHMENT DATA
50. NEW METHODS OF DATA COLLECTION
FOR ESTABLISHMENT SURVEYS
51. COVERAGE IN SCHOOL SAMPLING
FRAMES
55. LINKING LONGITUDINAL BUSINESS AND
HOUSEHOLD DATA
56. PLANS OF GOVERNMENT AGENCIES FOR
RESEARCH IN ESTABLISHMENT SURVEYS (PANEL SESSION)
57. A METADATA PRIMER FOR SURVEY
STATISTICIANS
0. "ESTABLISHMENT SURVEYS SINCE ICES1: WHAT HAS AND HASN'T CHANGED, AND WHAT ARE THE ISSUES FOR THE FUTURE?"
Keynote Address
Presenter: Susan Linacre, Australian Bureau of Statistics
In a rapidly changing world, the issues faced in surveys of establishments continue to evolve, though
many do so at a slower pace than might be predicted, and some remain remarkably constant. This paper explores some of the main issues discussed at ICES1, reviews the extent of progress and change in these issues, and seeks to identify those that remain with us as challenges for the future.The paper will also attempt to look into the future at what might be the new issues for establishment surveys, and the organizations conducting them, over the coming years, their possible implications, and the steps we might be taking in readiness for them.
1. IMPROVING THE QUALITY OF DATA REPORTING IN BUSINESS SURVEYS
Invited Paper Session
Organizer: SEYMOUR SUDMAN, University of Illinois, DIANE WILLIMACK, U.S. Bureau of the Census
Chair: LYNDA CARLSON, National Science Foundation
The Use of Cognitive Methods to Improve British Business Surveys. JACK ELDRIDGE, Office for National Statistics, United Kingdom
Cognitive research methods have proved very useful in improving the quality of household survey data. More recently it has been realised that there are serious problems with the quality of business survey data and some recent studies have started using cognitive methods to assess and improve the quality of these data.
In this paper the application of cognitive methods to business surveys will be discussed. Examples will be drawn from cognitive interviewing that was done on the Short Term Employment Surveys that collect workforce data from different sectors of the economy. This will highlight some of the problems that businesses have in completing the surveys, for example:
- Systems not set-up to easily provide data requested
- Multiple respondents
- Problems caused when survey forms change
- Failure to read guidance notes
- Attitudes towards completing the survey forms
The paper will also draw out some generalisations to be made about using cognitive methods in business surveys, in particular:
- Use of in-depth cognitive interviews in business surveys
- Use of expert reviews in business surveys
- Differences in methods when applied to business surveys
Testing of the Questionnaires for Statistics Canada's Unified Enterprise Survey. COLIN BABYAK, ALLEN GOWER, LIA GENDRON, JANE MULVIHILL and RAE-ANNE ZAROSKI, Statistics Canada
This paper will review the methodology and challenges of testing the questionnaires for Statistics Canada's Unified Enterprise Survey (UES). Testing of the questionnaires with respondents in the various UES pilot and transition industries has taken place each year from 1997 to 1999. The paper will describe the approach that was taken to accomplish the major activity of testing multiple industry-specific questionnaires in a relatively short time frame. In particular, the paper will describe the testing methodology and will show how cognitive interviews and focus groups were used during the testing. The paper will discuss the challenges that were encountered and how these challenges were met. The challenges included the coordination and scheduling of the testing, the development of priorities and issues for testing, the recruitment of respondents, the determination of the appropriate testing methods and the number of interviews and focus groups, and being able to analyze and report on the test results in order to make the recommended revisions to the questionnaires within the time constraints. As a result of the experience obtained through the testing of the UES questionnaires, many important lessons have been learned and advances have been made in developing sound practices and procedures for testing business survey questionnaires.Exploratory Research at the U.S. Census Bureau on the Survey Response Process in Large Companies. SEYMOUR SUDMAN, University of Illinois; DIANE WILLIMACK, ELIZABETH NICHOLS and THOMAS MESENBOURG, U.S. Bureau of the Census
This paper reports recent efforts of the U.S. Census Bureau to improve the quality of establishment surveys and reduce respondent burden by research utilizing selected aspects of cognitive research methods. This initial exploratory research involved conducting unstructured interviews with multiple informants in a sample of thirty large multi-unit companies to identify statistical reporting issues and problems from the companies' perspectives. Our findings are organized by a hybrid model of the survey response process in establishments, based on Torangeau (1984) and Edwards and Cantor (1991). Implications of our findings for data users and data collectors are addressed, and an agenda for future research is suggested.Discussant: SYLVIA KAY FISHER, Bureau of Labor Statistics
Floor Discussion
2. STATISTICAL GRAPHICS TRACT
Invited Paper Session
Organizer: DAVID DESJARDINS, U.S. Bureau of the Census
Chair: DAVID DESJARDINS, U.S. Bureau of the Census
Exploring the Potential of EDA: A Study of the Canadian Food Services Industry. DAPHNE BENNETT, LARRY MURPHY and JIM TEBRAKE, Statistics Canada
Options for producing analytical outputs have never been greater. Sorted reports vs. multi- dimensional databases, an 8X11 sheet of paper vs. a 21-inch colour monitor, a ledger vs. a spreadsheet. Advances have changed the manner and speed which data can be processed and analyzed. Analysts now are in an ideal environment in which exploratory data analysis (EDA) can actually be undertaken. Old approaches, constrained by processing limitations, confined analysts to inflexible "standardized" reports. Today analysts can confront data in a spontaneous fashion. They can choose to visualize their data, to conduct scenario analysis, and to create and evaluate different models in mere seconds. So analysts out there what are you waiting for, fire up your computers, load the latest software and get started! Well, not so fast! While technology empowers analysts to conduct EDA, there is a larger framework which must be considered. Indeed certain factors must exist outside of technology before EDA can be undertaken. This paper identifies four prerequisites necessary for successful exploratory data analysis:Subject-Matter Knowledge Without an intimate knowledge of the subject at hand exploratory data analysis will be futile.
Conceptual Framework - While an analyst may understand a particular industry or commodity, if the data are not built on a conceptually sound basis, organized in a coherent, consistent manner any hope of valid results is minimal.
Data Assembly - Armed with subject-matter knowledge and a conceptual framework, the analyst must begin to assemble the data to be analyzed. This will involve combining data from different sources and transforming data when needed.
EDA Techniques This is the final stage of exploratory data analysis. Once the user has the above three ingredients, they can begin to apply standard EDA techniques to explore and uncover residual data errors not detected by standard traditional editing techniques, and to find hidden meanings and interrelationships within the data.This study will illustrate the above principles using results from the Canadian Food Services Industry Survey, a segment of the overall Unified Enterprise Survey, which is a key component of the Project to Improve Provincial Economic Statistics (PIPES), a new initiative of the Business & Trade Field of Statistics Canada.
Experiences of Introducing Exploratory Data Analysis Methods at Statistics Sweden. CATARINA ELFFORS, GUNNAR ARVIDSON, LEOPOLD GRANQUIST and ANDERS NORBERG, Statistics Sweden
David DesJardins of the U.S. Bureau of the Census recently introduced Exploratory Data Analysis (EDA) techniques using SAS/Insight software to Statistics Sweden by teaching a nine day intensive course in May of 1998. After the course, the participants formed a Graphics User Group network in order to maintain and improve the achieved knowledge and skill of applying EDA. The aim, experiences and results of this network are described regarding the following issues.The main focus of the Graphics User Group is on exploring the usefulness of these methods by experimenting with the participant's various data sets. The ultimate goals are to implement EDA techniques in the processing of these surveys, and then getting these results more widely known within Statistics Sweden.
Another goal is to evaluate these EDA techniques for output editing, developing efficient edits, and controlling editing bounds. Using EDA technique to evaluate editing processes is another issue. A final issue is whether EDA methods could replace current editing in certain types of surveys. In particular, EDA and SAS/Insight are being considered as a methodology to accomplish efficient editing of repetitive surveys of dynamic populations. Generally, editing programs now rely on programmed edits developed when the survey started or was redesigned. This also applies to graphical output editing systems. The possibilities of using data from other surveys and sources in editing are also being explored.
The Main Barrier to Implementing an EDA Approach to Data Editing. PAULA WEIR, U.S. Energy Information Administration
Given the advances in technology, particularly with respect to graphics, the main barrier to implementing EDA in editing data is the universal reluctance to change. To convince others that the techniques of EDA through graphics is a better method, one must demonstrate that it performs better in general, and that it will be better for the individuals who will actually use the technique. Common editing performance measures focus on the general sense of better by measuring cost and quality through measures such as hit and miss rates. Unfortunately, translation of these into the environment of EDA is difficult. The powerful graphical approach visually provides information difficult to capture in hit and miss rates. It does not lend itself to a clear statistic to compare to a threshold above which are edit failures and below which are clean data, as necessary for traditional comparison of approaches through hit or miss rates. Furthermore, measures of better for the individuals who will actually use the technique are most likely non-existent for the current methodology. Measures on this aspect must be carefully designed and then calculated for both methods. This paper will explore this topic bringing into the discussion actual experiences in implementing an EDA approach and seek methods for reducing these barriers.
Roundtable Disc: What institutional and methodological barriers have kept the incredible advantages of EDA techniques from finding wider use in a number of agencies. DAVID DESJARDINS, HOWARD HOGAN, JIM HUNT, BILL WINKLER, U.S. Bureau of the Census; PAULA WEIR, U.S. Department of Energy; DAPHNE BENNETT, JOHN MCVEY, LARRY MURPHY and JIM TEBRAKE, Statistics Canada; CATARINA ELFFORS, ANDERS NORBERG and LEOPOLD GRANQUIST, Statistics SwedenRoundtable will begin with each panel member giving a brief introduction of him/herself & their EDA experiences/background -- followed by a brief (25 word or less) summary of their perspectives on the topic.
After this introductions, the forum will be opened to the audience where we will start by addressing one of the following kickoff questions. (The panel chair will moderate the discussions to allow for a follow-up of any prolonged discussion topics after the formal end of the session.)
Possible Kickoff Questions:
Status Quo: How do we get survey managers who are so invested in using listing after listing after listing of numbers to actually buy into (trust) the graphical/EDA approach?
New Technology: How have online programs like Microsoft's Netmeeting radically changed the way we teach/practice our desktop interactive graphical data analysis methodology?
Training Requirements: What level of training should be developed so that this EDA tool be understood and used correctly.
"Dead" Graphs: There will be a follow-up hands-on interactive CPU workshop after this session. How do "live" graphs represent a revolution (as significant as gunpowder to warfare) to traditional statistical data analysis methodology?
Accounting: What were the costs of implementation? How long did it take?
Floor Discussion
3. SURVEYS OF COMMODITY FLOWS: PROBLEMS AND IMPROVEMENTSInvited Paper Session
Organizer: PATRICK CANTWELL, U.S. Bureau of the Census
Chair: JOHNATHAN ELLISON, Statistics Canada
In this session, we present some of the most serious problems and improvements made in surveys of commodity flows as conducted in Canada, Sweden, and the United States. The authors represent survey managers, methodologists, and data users. Each paper begins with a summary of the purpose and design of the survey under discussion.
Electronic Data Collection for the Canadian For-Hire Trucking Origin and Destination Survey. FRANÇOIS GAGNON, STEVE RATHWELL and YVES GAUTHIER, Statistics Canada
Key words: electronic data reporting, autocoding.
The quarterly Canadian For-Hire Trucking Origin/Destination Survey (TOD) provides estimates of inter-city commodity transportation for large (annual revenues above $1 million) long distance carriers. This survey employs a two-stage design, where a sample of carriers are selected at the first stage, and a sample of probills (shipping documents) are selected from each sampled carrier via personal, on-site visits. Recently, we have been exploring the potential for electronic data reporting to reduce collection costs and response burden for this survey. Many carriers store their probill data electronically and could send it to us in this format; a small number do so already. The cost of cleaning up and coding each shipment record means that second-stage sampling would still be necessary. In fact, significant savings will only be obtained if ways can be found to reduce the cost of processing the electronic data. We examine the issues surrounding electronic data reporting for this survey, including data availability, data transmission methods, whether second-stage sampling should be done before or after transmission, how second-stage sampling could be done more efficiently from an electronic file, and whether commodities can be effectively autocoded or, if necessary, imputed.Design Effects and Measurement Errors in the Swedish Commodity Flow Surveys. STEFAN BERG, Statistics Sweden
Key words: commodity flows, multi-stage sampling, design effects, measurement process.
Statistics Sweden has conducted two surveys of commodity flows in Sweden. The first survey (CFS96) was conducted during spring 1996 in two counties of Sweden, and the second survey (CFS98) during the fourth quarter of 1998 in the remaining counties. The producer evaluation confirmed that the main components of the first survey worked well. These components included the survey design (a multi-stage sampling of establishments and shipments) and the method of data collection (mailout questionnaires supplemented by telephone interviews and, for some establishments, electronic media). However, the user evaluation indicated a need for more balanced origin-destination matrices, that is, improved precision in estimates of cross-county commodity flows. Therefore, for the selection of shipments in CFS98 different procedures were used for intra- and cross-county shipments. In this presentation we describe the main sources of errors and their effects on the precision of the estimates (both random and non-random aspects) due to the following: the stratification and allocation of the sample at different stages of the sampling, changes in the procedure for the selection of shipments, deviations in definition of a shipment, divergences from the reference period, variations in the period of measurement, businesses with intensive and heterogeneous flows of shipments, and the media for data collection. Finally, we propose several ways to improve the design and measurement process for future surveys of commodity flows in Sweden.BTS Freight Transportation Data Activities: Results and Products from the Commodity Flow Survey for Hazardous Material Shipments. RON DUYCH, U.S. Bureau of Transportation Statistics; JOHN FOWLER, U. S. Bureau of the Census
Key Words: hazardous materials, shipment characteristics, commodity flow.
This presentation features the improved coverage of hazardous material shipments in the 1997 Commodity Flow Survey (CFS). The CFS is conducted by the Census Bureau in partnership with the Bureau of Transportation Statistics (BTS). For the first CFS, conducted for data year 1993, the survey only collected a yes/no indicator as to whether or not a shipment was hazardous. This approach produced only limited results. For the 1997 CFS, a combined effort among BTS, private industry, and Census produced a new data item for collection: the 4-digit United Nations/North American (UN/NA) code. The collection and tabulation of the UN/NA information in the 1997 CFS has resulted in the first comprehensive view of hazardous materials flows in the United States. This presentation will show the various methods of tabulation: mode of transportation, hazard class and division, and selected UN/NA numbers, and how these numbers serve as exposure measures for risk assessments. We will examine the collection, editing, and tabulation of these data, as well as the process used to determine the optimum set of data tables for dissemination.Discussant: PATRICK CANTWELL, U.S. Bureau of the Census
Floor Discussion
4. ESTIMATION STRATEGIES USING VARIANTS OF POISSON SAMPLINGInvited Paper Session
Organizer: PHILLIP KOTT, U.S. National Agricultural Statistics Service
Chair: ESBJÖRN OHLSSON, Stockholm University
Poisson sampling is often associated with permanent random numbers (PRN's) and probability proportional to size (pps) selection. The first paper in this session discusses a useful Poisson PRN design in which the unit inclusion probabilities are NOT proportional to any intrinsically meaningful measure of size. The second paper addresses the sample-size variability of Poisson sampling by examining the properties of estimators that condition on the realized sample size. The third paper discusses the properties of a variant of Poisson PRN sampling in which the sample size is predetermined.
The Theory and Practice of Maximal Brewer Selection with Poisson PRN Sampling. PHILLIP KOTT and JEFFREY T. BAILEY, U.S. National Agricultural Statistics Service
K.R.W. Brewer suggests that when estimating the total of a single item for which there is control data, one employ a ratio or regression estimator and draw the sample using probabilities proportional to the measure of size raised to the 3/4'th power. Brewer's sample selection scheme can be expanded to multiple targets by drawing overlapping Poisson samples for a number of items simultaneously using permanent random numbers (PRN's). We can call this "Maximal Brewer Selection" (MBS). MBS allows the enumeration of different combinations of samples during different survey period. This paper develops the theory behind MBS and the estimation strategy rendering it practical, which includes calibration and delete-a-group jackknife variance estimation. The paper goes on to describes the experiences of the National Agricultural Statistics Service (NASS) with this strategy under both ad hoc and model-assisted allocation schemes.
Estimators for Use with Poisson Sampling and Related Selection Procedures. KEN BREWER, Australian National University; TIMOTHY GREGOIRE, Yale University
All estimators of total devised for use with Poisson sampling and related procedures, both in the context of establishment surveys (PRN sampling) and in forestry (3P sampling), have so far been either strictly or asymptotically design-unbiased. An important consequence is that their variances have been defined unconditionally over the complete range of all possible samples, rather than over the relevant and recognisable subset of all those possible samples that have the same size as the observed sample. In this paper we analyze Poisson samples conditionally on their achieved sample size as though they had been obtained using rejective sampling with a fixed sample size. The conditioning process provides a more relevant variance to estimate, even where the total is estimated using the usual ratio estimator with its built-in adjustment for achieved sample size. That ratio estimator, however, is not Cochran-consistent and its design-bias increases with achieved sample size. More suitable estimators are derived using cosmetic calibration, and their properties are investigated both analytically and empirically. Finally, the consequences of using similar estimators in the context of related sampling procedures are also subjected to scrutiny.
A User's Guide to Pareto ¶PS Sampling. BENGT ROSÉN, Statistics Sweden
A Pareto pps scheme selects a list sample with predetermined sample size n and prescribed desired inclusion probabilities along the following lines. The units in the sampling frame are associated with independent random variables Q, which are determined using a simple algorithm. The sample consists of the frame units with the n smallest Q-values. Even though true inclusion probabilities do not agree exactly with the desired ones, they are close enough for all practical purposes. As a consequence the expansion estimator has a negligible bias. The sampling scheme has a number of attractive properties. Point estimates have high precision. In fact, Pareto pps is, to the best of our knowledge, optimal among schemes that admit objective assessment of sampling errors. Simple formulae for variance estimation are available. By incorporating permanent random numbers, Pareto pps can efficiently coordinate samples. There is a simple nonresponse- adjustment procedure under the most common non-response model, group-wise uniform response propensities. An estimation strategy with many desirable properties can be obtained by combining Pareto pps and generalized regression estimation.
Discussant: PEDRO SAAVEDRA, ORC MacroFloor Discussion
7. DESIGNING COMMODITY SURVEYSInvited Paper Session
Organizer: MARIE BRODEUR, Statistics Canada
Chair: MARIE BRODEUR, Statistics Canada
Commodity Surveys are becoming more widespread, as users become increasingly focused on capturing the flow of goods and services, at very detailed levels. However, the design of a commodity survey presents many unique challenges.
- A high level of detail is required, sample allocation becomes critical. In some cases, it is not clear which variables should be used in the design process. Multivariate allocation methods are often used.
- Some sample designs use a two-phase approach or make use of administrative data for small businesses that have most likely a small number of commodities.
- In order to reduce response burden, customized questionnaire are usually used for collection. This has a major impact on the design of the edit system. Performing edit and imputation methods represent a major challenge because of the hierarchical structure of parts and totals.
- Confidentiality procedures should be applied carefully because it becomes easier to identify a company when the commodities are released.
In Canada, the Retail Commodity Survey has been recently designed and the Annual Survey of Manufacturing is currently being redesigned. The United Kingdom is currently looking at allocation problems while in the United States many new initiatives to test alternative approach to collection are being looked at. The objective of the session will be to present the different problems, from a theoretical and practical perspective, experienced by these three countries. Possible solutions will also be addressed.
Multivariate Allocation for Commodity Surveys. PAUL SMITH, Office for National Statistics, United Kingdom
Many business surveys run by NSIs collect information on several, or even many variables, but they are typically designed with reference to the accuracy of one or two main variables. In some cases it is not clear which variables should be the most important, or which should be used in the design process. Commodity surveys are an extreme example of this where the estimation of sales of all commodities is important (but not necessarily equally important), and where there may be a very large number of commodities. Several methods have been suggested for multivariate allocation, including single solutions from the range of variables chosen, methods based on satisfying many constraints simultaneously and summary-based methods, such as minimising the average (or a weighted average) of the commodity variances (for example Chambers, Cruddas & Smith, 1997). In addition, both design-based and model-based approaches have been proposed. In this paper, the properties of these methods are considered and several are compared in the context of the European PRODCOM survey as it is implemented in the UKs Office for National Statistics.
Sampling and Estimation Strategies for Commodity Surveys. WISNER JOCELYN, MARIE BRODEUR, NANCY FORGET and DIDIER GARRIGUET, Statistics Canada
The Canadian Annual Survey of Manufacturing collects financial and commodity data. It is now being redesigned and different strategies are being investigated. This paper compares several sampling and estimation schemes suitable for a commodity survey. In particular, we will consider a stratified one-phase design using commodity detail available at the population level, a two-phase design where information gathered at the first phase is used to further stratify before the selection of the second phase sample, a two-phase design where the first and second phase samples are selected simultaneously and data collected is used to generate full census micro-data.
Different allocation methods similar to the work of Jocelyn and Brodeur (1996) will be discussed. Various stratification rules along with some estimation methods using auxiliary information will also be considered. Data from the Canadian Annual Survey of Manufactures will be used for comparison.
A Feasibility Study of an Annual Industrial Report Program. JUDY DODDS, STACEY COLE, THOMAS FLOOD III and JOHN GATES, U.S. Bureau of the Census
For years, the Census Bureau has collected detailed commodity data in two ways. We collect it for all commodities, on an establishment basis once every five years in our Census of Manufactures. We also collect it for selected groups of commodities on a more frequent basis in our Current Industrial Reports (CIR) program. In addition to that, we collect fairly aggregated groups of commodities in our Annual Survey of Manufactures which is an establishment survey.The set of commodities included in the various CIR surveys and the frequency with which we collect them evolved over time to meet a variety of needs ranging from strategic defense planning to international trade impact monitoring. There is no consistent approach across all manufacturing to collecting detailed commodity data.
We conducted a test to determine if we could use the resources currently dedicated to the existing combination of Census/ASM/CIR to provide more consistent detailed commodity data for all manufactured commodities each year. Our thought was that by surveying a sample of companies rather than establishments, we could a more detailed set of data while staying within existing resources.
The paper presents a more detailed description of the context, the plan for the test, and the results of the test.
Discussant: KATHERINE THOMPSON, U.S. Bureau of the Census
Floor Discussion
8. SURVEYS IN ESTABLISHMENT DATA COLLECTIONInvited Paper Session
Organizer: YOUNG CHUN, University of Maryland and VASJA VEHOVAR, University of Ljubljana, Slovenia
Chair: YOUNG CHUN, University of Maryland
Electronic mail and Web surveys have just emerged as an important mode of data collection. This may be a questionable approach for the general population where - even in the most developed countries - the coverage has hardly reached one half of the population and on-line access from home is even much lower. In establishments the coverage is much higher, particularly in large companies and in school institutions. However, the obvious attractions of these surveys - the costs, speed, edit checks - face some serous problems:
- coverage problems, particularly within establishment access,
- sampling, response rates, dual frames,
- quality of data, editing, questionnaire design,
- cooperation and computer-human interaction,
- technical and organization and limitations.
The objective of the session is to present these problems in different environments, from a theoretical and practical perspective. The Bureau of Labour Statistics was among the pioneers in introducing these tools in establishment surveys and today they are also at the frontiers of applying solutions in this field. Particularly valuable is their experience with the cooperation of establishments. At the Census Bureau there was an intensive effort performed in testing and implementation of these methods. Specifically, the research on data quality issues is discussed. The Web surveys in Slovenian companies and schools addresses other typical
issues, the problem of costs and the aspects of dual frame designs. The session thus covers all basic aspects of the contemporary use of Web surveys in
establishment surveys.
Electronic Data Collection in Selected BLS Establishment Programs. R. CLAYTON and MICHAEL SEARSON, Bureau of Labor Statistics, U.S.A.
In early 1995 the Bureau of Labor Statistics opened the Electronic Data Collection (EDI) Center in Chicago, IL to facilitate the collection of
employer data each month for the Current Employment Statistics (CES) Survey, one of the BLS's most timely and visible programs. Later that year, the
facility's operation was expanded to include the collection of data from another of its statistical programs, the Covered Employment and Wages
Program. The latter program uses administrative data submitted by employers each quarter for Unemployment Insurance purposes. These data are
supplemented, where necessary, with establishment level employment and wages collected on the Multiple Worksite Report.The primary purpose of these electronic data collection efforts is to increase employer participation in these programs by reducing their reporting burden and costs. Likewise, the BLS and their state partners in these statistical programs benefit as they generally receive more comprehensive and timely data; reduce their processing (handling, data entry, etc.,) and postage costs; and, free-up more staff time for data analysis and dissemination activities.
This paper focuses on the BLS strategy to recruit employers to provide these data in this reporting medium. For the Multiple Worksite Report, this policy was later modified to reflect a more specific target audience. Previous surveys of employer reporting practices had revealed the extent to which employers contract out the preparation of their payroll and the filing of various federal and state payroll tax reports. In a similar manner, this research revealed that those employers that prepared their payrolls "in house" and filed their own tax payroll tax reports were found to frequently purchase the system/programs to perform these tasks. Consequently, BLS staff decided that these payroll/tax service providers and developers of payroll/tax systems should be the focal point for inclusion of these reports in their systems. The obvious advantage to this approach was that any client using their payroll/tax service or purchasing their software would not have to incur the developmental costs associated with adding electronic reporting of these data to their system. It would either be included as a service by their agent or already included in the software that they purchased.
For the CES Survey, the primary focus has been large multi-state employers that are sampled with certainty for the program. These large firms often are asked to provide reports for hundreds of worksites each month within very tight timeframes, making centralized electronic reporting a more cost and time effective option.
This paper describes how BLS is pursuing these approaches and discusses results to date. In addition, this paper will note other areas that BLS is examining that will offer smaller employers more electronic options to report these data.
Cost and Errors of Web Surveys in Establishment Surveys. Z. BATAGELJ, K. LOZAR, V. VEHOVAR and M. ZALETEL, University of Ljubljana, Slovenia
1. The problem
Surveys performed on the Web are extremely cost-attractive and they are becoming an increasingly important mode of data collection. However, the problems with sampling frames, the lack of technical support for the person eligible to answer in an establishment survey, and specific non-response
patterns, make this mode extremely problematic. The following paper overviews the state of Web survey methodology, the prospective and, particularly, the issues of costs and errors of these surveys.
2. The outline
a) The following survey design characteristics are studied: introduction (pre-letter), survey topic, layout of the questionnaire, type of solicitation (telephone, mail, email), number of contacts, technical aspects, and length of the questionnaire. The comparison with mail and telephone surveys provides a profound analysis of the non-response behavior on each wave/type of solicitation.
b) The model for the propensity to co-operate in a Web survey is developed. In addition to non-response, the measure of the satisfaction with the survey is included into the model.
c) The costs are also studied: a model that incorporates costs, non-response bias, measurement error, variance, and, most importantly, the number of contacts is developed. The mean squared error is then minimized at fixed costs. The circumstances in which the Web survey mode is optimal are discussed together with the optimal number of contacts.3. The data
The empirical work is based on a national Research on Internet in Slovenia project http://www.ris.org/), where surveys among companies (n=3,000) and school institutions (n=800) have been conducted regularly since 1996. The controlled experiments on sub-samples are performed with Web, telephone and mail survey modes. This enables us to study the non-response, the mode effects and the quality of the data. The qualitative research is also performed to clarify specific aspects such as, for example, why eligible respondents - when given a mail and a Web option - prefer to select the mail questionnaire.
Web-Based Collection of Economic Data at the U.S. Census Bureau. BARBARA SEDIVI, ELIZABETH NICHOLS and HOWARD KANAREK, U.S. Bureau of the Census
In 1997 the U.S. Census Bureau conducted its first pilot test collecting U.S. Title 13 economic data using the World Wide Web. With the extraordinary growth of the Web, especially in businesses, the Census Bureau pursued testing data collection using the Web. In theory this mode offers potential advantages for both the respondent and the survey agency. Electronic questionnaires can reduce respondent burden by providing features such as online help, auto-calculations, and in the business environment, importing of data from preexisting spreadsheets. Electronic questionnaires also have data quality advantages over paper by the use of interactive edits, which allow respondents to correct their responses as they are entered. In addition, using the Web as a delivery vehicle is relatively inexpensive compared to the postal service.
In this paper, we provide an overview of the Web data collection efforts undertaken at the Census Bureau. We highlight issues that have arisen during this tentative start-up period regarding respondent hardware and software, questionnaire design, security, coverage issues, implementing multi-mode designs, response rates, and motivation to use the Web. We discuss the potential benefits of Web data collection and relate them to our initial tests. We end by speculating about the future of Web data collection over the next decade for government statistical reporting.
Discussant: DON DILLMAN, Washington State UniversityFloor Discussion
9. GENERALIZED ESTIMATION SYSTEMS IN GOVERNMENT AGENCIESInvited Paper Session
Organizer: MIKE A. HIDIROGLOU, Statistics Canada
Chair: MIKE A. HIDIROGLOU, Statistics Canada
A number of Generalised Estimation Systems have been developed recently by a number of government agencies. The include Bascula (Statistics Netherlands), CLAN (Statistics Sweden), GES (Statistics Canada), and STEPS (U.S. Census Bureau), These systems share a number of common functionalities, but also differ in several respects due to the different environments that they evolved in.
This session would bring enable the audience (and proceedings readers) the opportunity to find out more about the different functionalities offered by these systems. As the session would have 4 papers, we would have no discussants.
CLAN - A SAS-Program for Computation of Point-and Standard Error Estimates in Sample Surveys. ANDERSSON CLAES and LENNART NORDBERG, Statistics Sweden
CLAN is a program developed at Statistics Sweden and designed to compute point- and standard error estimates in sample surveys. Being written in the SAS macro language, it works in different computer environments, e.g. under Win-dows, UNIX and on mainframe computers. Only the Base SAS Software is necessary.
All parameters that can be written as arbitrary ra-tional functions of population (domain) totals, can be handled. A population mean is a simple example of such a function. Other examples in-clude a ratio or a difference between two population or domain totals. A dif-ference between two ratios under a panel survey design, where the two ra-tios refer to different time periods is a further example.
CLAN computes an estimate of the parameter and an estimate - based on Taylor linearization of the standard error. The Horvitz-Thompson (H-T) estimator and/or the generalised regression (GREG) estimator can be used for totals.
The sampling designs implemented in CLAN include stratified simple random sampling (SRS) without replacement of i) elements, or ii) clusters of elements. By the combination of i) and ii) with two different non-response models we get four principal cases. The majority of surveys conducted at Statistics Sweden reduce to these four cases, including surveys that use pps-sampling (approximations for pps can be obtained), various types of network sampling (e.g. sampling of individuals to study households) and two-phase sampling schemes for strati-fication.
Towards a Generalized Weighting System. N.J. NIEUWENBROEK, R.H. RENSSEN and L.P.M.B. HOFMAN, Statistics Netherlands
At Statistics Netherlands the software package Bascula version 3.0 has been developed combining weighting of sample data using auxiliary information with variance estimation based on balanced repeated replication (BRR). Much attention has been paid to implement various techniques in an easy and user-friendly way. It neatly fits in the general structure of Blaise in that it is capable to use both data and meta-data information provided by the Blaise system. The package is already in use for various person and household surveys. An increasing acceptance with business surveys is expected. Eventually Bascula will be a part of a general processing and estimation environment in which the whole process of outlier detection and handling, (micro and macro) editing, imputation (for unit and item nonresponse) and weighting of the clean records will be integrated. We will report on our experiences with Bascula and on extensions that we have already planned.
Estimation and Variance Estimation in a Standardized Economic Processing System. RICHARD SIGMAN, U.S. Bureau of the Census
The U.S. Census Bureau has developed software called the Standardized Economic Processing System, or StEPS, that it plans to use to replace 15 separate systems, which are currently used to process 113 current economic surveys. This paper describes the methodology and design of the StEPS modules for estimation and variance estimation and chronicles our experiences from 1997 through 1999 in using these modules to migrate surveys into StEPS. For estimation, we found that a single computation approach, based in part on the generalized regression estimator, was successful in calculating all needed estimates, expect for quantiles. For variance estimation, however, we found it necessary to provide StEPS users with different software options depending on the sample design and method of variance estimation. StEPS has separate submodules for Poisson-sampling variances, Tille-sampling variances, variances calculated using the method of random groups, and variances calculated by VPLX using pseudo-replication methods. The paper concludes with a discussion of possible future enhancements to the estimation and variance estimation functions in StEPS.
Generalized Estimation System with Future Enhancement. MIKE HIDIROGLOU, VICTOR ESTEVAO and CHARLIE ARCARO, Statistics Canada
This paper present the features of the recently developed Generalised Estimation System developed at Statistics Canada. The Generalised Estimation System (GES) uses auxiliary information to produce estimates for one-stage designs, as well as two-phase designs for a number of parameters of interest.. The package uses regression estimators known as GREG (Generalised Regression Estimator) and includes the wider family of calibration estimators to produce estimates of totals and ratios of two-variables. Most estimators used in survey practice, including the post-stratified and raking ratio estimators, are special members of GREG. We have also developed an estimation package (SIMPVAR) that incorporates the variance of imputed data for a number of imputation procedures. It is planned to integrate SIMPVAR into GES.
The variance computations in GES either use the Taylor or the jack-knife procedure. Both variance estimation procedures can currently estimate variances of totals or ratios that have been obtained using GREG or the calibration procedure. However, each type of variance computation requires separate computational routines. We are planning to extend the Taylor driven variance computations by imputing into GES automatically computing lineraized residuals that account for the auxiliary information, as well as the estimator being considered (totals, means, ratios, or more generally linear or non-linear functions of totals).
This opens the way to extend GES to be more than an estimation package. It will become a data analysis package that can carry out complex data analyses. Some of these analyses include, regression, chi-squared tests for contingency tables, logistic regression, multivariate analysis, as summarised in Skinner, Holt, and Smith (1989).
Floor Discussion
13. STRATEGIC DIRECTIONS FOR BUSINESS SURVEYS: AN INTERNATIONAL PERSPECTIVEInvited Paper Session
Organizer: DON ROYCE, Statistics Canada
Chair: DON ROYCE, Statistics Canada
Strategic Directions for Business Surveys: An Australian Perspective. SUSAN LINACRE, ROBIN SLATER and GEOFF LEE, Australian Bureau of Statistics
In many statistical agencies, the National Accounts provide an integrating framework for business surveys, in terms of the data sought and classifications used. They also provide ongoing pressure to seek methods that produce cohesive data. The 'stove pipes' that have existed historically around different administrative data sources, annual business collections and sub-annual indicator series are being broken down to provide an integrated system of business surveys across the economy. As the integration effort proceeds, new user needs within this framework are generating further demands - for example for more frequent supply/use tables, or for regional dissections.Simultaneously other pressures for change continue to emerge. Views other than the standard National Accounts framework are required. Interest is growing in areas such as the environment, tourism, culture and leisure, the impacts of globalisation, sustainability indicators, spatial analysis and leading indicators of change. Simply measuring and reporting on economic activity and its components is insufficient. There is interest in what drives business performance, and it's consequences for the environment, for peoples working lives, and how these link with social measures.
This paper considers the forces driving change, the future demands on the ABS's 'system' of business surveys, and their implications, and reviews mechanisms to meet the various user demands while minimising respondent load and cost.
Statistics Canada's Broad Strategy for Business Statistics. PHILIP SMITH, Statistics Canada
Statistics Canadas business statistics program consists of about 225 individual surveys, managed by some 15 divisions each specializing in one or more particular subject matter areas. Two decades ago, the surveys were conducted more-or-less independently. Each had its own unique questionnaire approach; sampling frame or business list; sample design; data collection and capture system; respondent relations strategy; edit, imputation and estimation system; and analysis and dissemination program. There were some common standards and systems, but they were more the exception than the rule.The decade of the 1980s was characterized by movement toward greater integration. Corporate policies were instituted to encourage increased commonality in questionnaire and sample design; a central business register was established and efforts to convert surveys to its use got under way; the data collection and capture function was taken out of the subject matter divisions and centralized; and generalized methods and systems for edit and imputation, estimation and some other survey functions were developed.
The pace of survey integration slowed in the early 1990s, but since mid-decade business survey harmonization and unification have been given a high priority. Progress is being made in several areas. The paper will discuss Statistics Canadas high-level strategy for business surveys development, focussing in particular on: questionnaire harmonization; moving all surveys to the central business register; integrating sample designs; exploiting administrative data; enterprise-centric data collection and analysis; common databases and systems; and improving respondent relations and burden management strategies.
Satisfying Emerging Data Needs: Measuring Electronic Business. THOMAS MESENBOURG, U.S. Bureau of the Census
The extraordinary growth of the Internet is changing the way we communicate, seek and access information, purchase goods, and interact. The successful integration of information, communications, and computer technology opens new purchasing channels to consumers and provides firms with the opportunity to fundamentally change the way they conduct their business. Electronic business (e-business) is growing very rapidly and will likely cause as much change in the structure and performance of the American economy as the introduction of the computer, yet currently there are no official measures of e-business activity and little understanding of how it is effecting existing measures of economic activity.The growing demand by policy makers and industry for reliable measures e-business activity present a useful case study of how a statistical agency responded to an important emerging data need. The fact that electronic business is still in its infancy and changing rapidly poses special problems. This paper describes the Census Bureaus strategy and plans for measuring e-business activity.
Discussant: RICH ALLEN, U.S. National Agricultural Statistics ServiceFloor Discussion
Invited Paper Session
Organizer: MARIETTA MORRY, Statistics Canada
Chair: MARIETTA MORRY, Statistics Canada
Trend Estimation at the Ends of Series Using Adaptive Semi-Parametric Local Dynamic Models. ALISTAIR GRAY and PETER THOMPSON, Statistics New Zealand
End filters for non-seasonal time series constructed using a minimum revisions criterion and optimal linear (biased) prediction with respect to semi-parametric local dynamic models were first introduced by Gray and Thomson (1996), Research Report CENSUS/SRD/RR-96/1, Statistical Research Division, Bureau of the Census, Washington. There it was shown that a suitable choice of a bias term, related to X-11's $I/C$ ratio can lead to minimum revisions that are competitive and sometimes better than those achieved using ARIMA forecast extension. This paper extends the earlier work in two areas. First, methods of estimating the parameters of the local dynamic models are more fully discussed. Second, the use of adaptive bias terms in the end filters is shown to lead to further improvements in trend estimation.
Locally Adaptive Trend Estimation for X11. BENOIT QUENNEVILLE, Statistics Canada and DOMINIQUE LADIRAY, Institut National de la Statistique et des Études Économiques (INSEE)
In this paper we compare methods of estimating local trend levels. Our goal is to investigate methods of improving the performance of X11-based seasonal adjustment procedures such as X11-ARIMA and X12-ARIMA. We focus our attention on symmetric Henderson moving averages and how to extend them to deal with missing observations at the end of the series. This is an important problem with X11-type seasonal adjustment methods where the trend estimates of the most current data points are revised as more data become available.We study two approaches. In the first approach we use Musgrave asymmetric moving averages which depend on a parameter called the I/C-ratio, and we study various ways of estimating that ratio. In the second approach, we always use the symmetric Henderson moving average with forecasted missing observations; consequently, there is no need to derive asymmetric moving averages. In that second approach, we study various ways of forecasting the missing observations.
Essentially, we try to see if we can improve the original X11 asymmetric moving averages, and if we should use ARIMA extrapolations in each step of X11 instead of asymmetric moving averages. We use a minimum revision criteria to compare the proposed methods. We illustrate our approaches with two specific examples, and we conclude our paper with areas for further research.
Presentation of Trend Estimates in Official Statistics: U.K. and International Practice. TIM JONES and SIMON CROMPTON, Office of National Statistics, United Kingdom
It is standard practice in many national statistical institutes (NSIs) to publish key economic time series in seasonally adjusted form. This is done to help users interpret series, particularly over the short-term. Some NSIs also publish trend estimates, presenting users with their best estimates of the underlying behaviour of series. In this paper we report the results of two surveys of international practice on the presentation of trends. Examples of different presentations will be used to illustrate methodological issues that arise. At the end of the paper we set out the approach of the UKs Office for National Statistics, which has attracted much recent interest.
Designing for Trend Estimation in Repeated Business Surveys. DAVID STEEL, University of Wollongong and CRAIG MCLAREN, Australian Bureau of Statistics
The sample designs for business surveys, in particular the stratification and allocation to strata, are usually developed with the sampling variance of estimates of levels as the main statistical criteria. For monthly and quarterly surveys the sampling variances of estimates of short term movements, such as monthly and quarterly changes, may also be considered in developing the sample design. Rotation patterns are designed to balance the effect on the variance of estimates of short term movements with respondent load and costs. A major use of repeated surveys is to enable the current trends to be assessed. The designs currently used are not necessarily good for estimating trends. We consider the development of sample designs, in particular allocation of sample to strata and rotation patterns, for trend estimation. The Monthly Retail Trade survey will be considered as an example.
Discussant: DAVID FINDLEY, U.S. Bureau of the CensusFloor Discussion
15. IMPROVING RESPONSE RATES OF BUSINESS SURVEYS USING INTEGRATED COMMUNICATION STRATEGIESInvited Paper Session
Organizer: MARTIN LUPPES and GER SNIJKERS, Statistics Netherlands
Chair: MARTIN LUPPES, Statistics Netherlands
Tailored Design of Self-Administered Business Surveys. DON DILLMAN, Washington State University
Tailored Design of Self-Administered Surveys is a perspective for doing surveys in which the development of survey procedures are aimed at creating respondent trust and perceptions of increased rewards and reduced costs for being a respondent. Tailored Design takes into account features of the survey situation including specific population, survey content and sponsorship when formulating questionnaires and implementation procedures (See Dillman, D.A. Tailored Design of Mail, Internet and Other Self-Administered Surveys, John Wiley, In Press). Development of this perspective is aimed at replacing the one-size-fits-all orienta-tion fostered by development of the Total Design Method, which it replaces. In this paper I outline the Tailored Design Perspective and reasons for its development. Its application to establishment surveys is then discussed using a wide variety of business surveys that show the commonality and differences in specific survey designs that result. Development of the Tailored Design application for business or establishment surveys attempts to take into account the multiple ways in which business surveys differ from individual-person and household surveys, while also building upon their commonalities. Important research questions posed by a Tailored Design approach to establishment surveys will also be discussed.
The New KONTIV Design - A Total Survey Design for Surveys on Mobility Behaviour. WERNER BRÖG, Socialdata, Germany
Surveys on mobility behaviour are of great importance for transport policy and planning. They are often the basis for far-reaching decisions in terms of transport conditions and investments.
Since the mid-seventies diary techniques have been used for collecting data on mobility behaviour. Since then big efforts have been undertaken to improve the response and quality of these data. Since then SOCIALDATA has been involved in this process and has developed the New KONTIV Design (NKD). This design is strictly respondent-orientated and guarantees high response rates and high quality data. The NKD is set up as mail-back survey with a diary for self-completion, with a telephone motivation of respondents and (possible) subsequent satellite surveys for more detailed / questions of sub-groups.
This design has been proven in hundred of surveys in more than ten countries all over the world. The NKD was intensively tested in a Glasshouse-project for the redesign of the national Dutch Travel Survey and also proved to be successful in the application of Statistic Netherlands. This paper describes the basics of the New KONTIV Design.
The Best of Two Worlds: TDM and NKD. An Operational Model to Improve Respondent Co-operation. GER SNIJKERS and MARTIN LUPPES, Statistics Netherlands
In the Netherlands, non-response is a major problem in survey research, both in household surveys and business surveys. Although business surveys are compulsory, the response rates of these surveys are considered to be too low. The response rates of production Surveys (including number of employees, turnover and costs) vary between 50 and 85%.
In 1999 a program was started to increase response rates for business surveys by improving the communication with the respondent. The traditional approaches based on simple communication strategies (one stimulus for all units at the same moment, classical reminder approaches using authority principles, etc.) do have serious problems with response time, net response and response quality. The goal of this program is to develop measures that change the traditional, formal and passive contact strategies into active, respondent driven and motivational approaches.
In the past several measures have already been implemented, like a cognitive laboratory to improve the wording of content letters and questionnaires, and a guide for forms design standards to improving the layout of letters and questionnaires. Also a program of internal quality audits (within a Total Quality Management program) was started.
This program is based on the philosophies of Dillman (Total Design Method) and Brög (Neu Kontiv Design). Where the NKD design allows for undefined respondent behaviour, the TDM approach is based on the more general standardised survey approach of well defined respondent behaviour. In fact, both TDM and NKD are quite similar in their (respondent driven) paradigms, but quite different in their operational approach. And both have proven their benefits. So, the challenge is to create measures that incorporate the best of two worlds.Discussant: COLM O'MUIRCHEARTAIGH, University of Chicago
Floor Discussion
19. WORKPLACE AND EMPLOYMENT SURVEYSInvited Paper Session
Organizer: PIERRE LAVALLÉE, Statistics Canada
Chair: PIERRE LAVALLÉE, Statistics Canada
Since recent years, there has been a growing interest for studying the behaviour of employers and employees. The objective is generally to investigate relationships among competitiveness, innovation, technology use and human resource management on the employer side, and technology use, training, job stability and earnings on the employee side. To fill in this need, Statistical Agencies of different has launched longitudinal business surveys on workplaces and employment. This is the case namely in Canada, the United Kingdom, France and Australia. In Canada, this survey is called the Workplace and Employee Survey (WES). Although these surveys have a similar objective, they have some interesting differences namely in their frequencies (e.g. every 5-7 years for Australia and UK, but annually for Canada), their sample design (e.g. one off survey of employees for Australia and UK, but panel for Canada), and their linkage (e.g. one-way linkage for Australia (employer to employee) but two way for UK and Canada). The purpose of this session is to present the surveys mentioned above, stressing out their similarities and differences.
Multiple Perspectives on Employment Relations: Experience from the 1998 Workplace Employee Relations Survey. MARK BEATSON, Department of Trade and Industry, United Kingdom
The 1998 Workplace Employee Relations Survey (WERS98) was the fourth in a series of workplace surveys in Great Britain, the first of which was carried out in 1980. The 1998 survey was the largest to date. It included management data drawn from interviews with some 2,000 managers; data drawn from interviews with employee representatives (where ones was present); and, for the first time, self-completion data from nearly 30,000 employees in these same workplaces. There was also a panel element based on a separate survey of managers in workplaces included in the 1990 survey.The paper reflects briefly on lessons learnt from the conduct of the survey (following on from the paper presented at ICES I) and the challenges such a rich data set poses for the survey analyst.
The paper then presents some illustrative analyses of the survey data drawing on data drawn from the various questionnaires. They show how looking at workplace employment relations from a variety of perspectives can enrich our understanding.
Organizational Change In French Manufacturing: What do we Learn From Firm Representatives and From Their Emploees? JACQUES MAIRESSE and NATHALIE GREENAN, Institut National de la Statistique et des Études Économiques (INSEE)
In this paper, we use a French matched employer-employee survey, the COI survey, conducted in 1997, to describe the general features of organizational change in manufacturing firms with more than 50 employees. In a first section, we explore the methodological issues associated with the building up of a statistical measure of organizational change, we describe the COI survey and we present the set of firm level and employee level variables that we have selected to investigate organizational change. In a second section, we present the results of two correspondence analysis, one conducted on a sample of 1462 firms from the COI survey and the other one conducted on the sample of 2049 blue collar workers affiliated to those firms.On one hand, using the firm level section of the survey, we show that all types of new organizational practices are positively correlated with one another. On the other hand, at the blue collar level, three main dimensions discriminate between jobs: the intensity of involvement in information processing and decision, the intensity of constraints weighing on the content and rhythm of work and the orientation of information and production flows: either pushed by colleagues or pulled by the market. We also find that blue collars cannot develop a high level of involvement in information processing and decisions and have at the same time their work rhythm fixed by heavy technical constraints whereas high time pressure imposed on work rhythm by the market is positively correlated with such an involvement.
Finally, if we correlate firm level and worker level variables, we find that an increase in the use of employee involvement and quality practices by the firm is positively correlated both with a higher level of blue collars involvement in information processing and decision and with a higher level of technical constraints, production flows being pushed by colleagues rather than pulled by the market.
The mapping of firm level responses stemming from our first correspondence analysis has been used to select 4 firms in different areas of the statistical universe and belonging to the machine and equipment sector. Post-survey interview carried out with executives from these firms and plant visits are used to check the quality of our statistical data and to better understand our descriptive results.
The Methodology of the Workplace and Employee Survey. ZDENEK PATAK, PIERRE LAVALLÉE and MICHEL HIDIROGLOU, Statistics Canada
Canadian firms and their employees have always faced a competitive, changing environment. The development of a North American free trade zone has certainly heightened awareness of the competitive environment. The growing disparity among workers, in terms of both earnings and hours, has been well documented. These trends contribute to a general sense that economic change is increasingly difficult to understand, that the cost of change is mainly borne by less adaptable workers, and that even among the winners in the labour market, employment is becoming less stable. Looking at these trends, analysts in Statistics Canada and elsewhere have reached the conclusion that there are two key elements missing in our understanding of firm performance and worker outcomes.The determinants of how well firms respond to change can be properly studied in a longitudinal setting that covers all the firm characteristics and behaviours related to performance. The practices and policies related to employees are also of interest, since they must be the agents of change in the firm. Their fortunes are tied to what they do on the job and how they interact with the internal forces of change within a firm. Thus the ideal survey instrument would follow the linked samples of employers and employees over an indefinite period.
The Workplace and Employee Survey (WES) is such an instrument. This survey has been launched in spring 1999. The objective of WES is to investigate the relationships among competitiveness, innovation, technology use and human resource management on the employer side, and technology use, training, job stability and earnings on the employee side. WES sheds some light on what triggers hiring and separations, actual and perceived job stability, which employees use particular technologies and how it affects their skill requirements and pay, and how employee compensation and human resource practices relate to firm performance.
In the paper, we will first present the longitudinal strategy for the employer and employee portions of WES. Second, the sample design aspects will be described. Third, we will present some preliminary results obtained from the first wave of data collection. Fourth, some longitudinal issues related to the second wave of WES will be discussed, followed by a discussion on the future work for the next waves.
Discussant: TED WANNELL, Statistics CanadaFloor Discussion
20. PRIORITY CONTACT OF NON-RESPONDENTSInvited Paper Session
Organizer: RICHARD MCKENZIE, Australian Bureau of Statistics
Chair: PAUL SUTCLIFFE, Australian Bureau of Statistics
A Framework for Priority Contact of Non-respondents. RICHARD MCKENZIE, BILL GROSS, STEPHEN CARLTON, and PAUL MAHONEY, Australian Bureau of Statistics
Businesses which have not replied to a mail survey are often telephoned to obtain a response, and, as this contact is expensive, strategies are used to determine which businesses are given priority for contact. This paper presents a framework for developing and assessing prioritising rules. The situation is described by a set of respondents , models for the probability of response, with and without contact, for the rest of the sample, and models for the reported and imputed values for units. A score function is derived which, for a fixed number of contacts, maximises the expected improvement in accuracy. The framework can be extended to include significance editing.The framework is used to describe the contact priority rules used by the ABS Survey of Employment and Earnings for significance editing and non response and to assess their performance.
A Comparison of Two Mixed Mode Survey Strategies: Telephone with Mail Followup and Mail with Telephone Followup. DANNA MOORE and JOHN TARNAI, Washington State University
This paper describes our experiences with implementing mixed-mode surveys of business establishments. The research addressed is whether a mixed-mode strategy of telephone followed by mail questionnaire or mail questionnaire followed by telephone leads to a higher response rate in establishment surveys. In one survey of physicians we compared two mixed-mode strategies. One half received a telephone interview folowed by a mail questionnaire to nonrespondents. The second half received a mail questionnaire followed by telephone interview of nonrespondents. The CASRO response rate for the first was 59%, but it was 67% for the latter. In a separate nationwide survey of manufacturers we piloted two variations of mixed-mode strategies and followed with a full study that used the recommended strategy. The CASRO response for pilot 1 was 45% and for pilot 2 was 65%. The full study consisted of a mail survey of 8,800 manufacturers with telephone follow-up to non- respondents. The final response rate achieved, was over 68%. Given these results, we could argue that effectiveness of the mixed mode procedure depends on an interaction of the sequence of modes and the populations to be surveyed. Telephone with mail followup works better for manufacturing firms, and mail with telephone followup works better for physician offices.
Non Respondent Follow-up in Annual Enterprise Surveys. PATRICK HERNANDEZ, Institut National de la Statistique et des Études Économiques (INSEE)
There are many reasons for total non response, such as inaccurate addresses, firms policy (the firm deliberately never answers surveys) or cost to respondents.INSEE uses two ways of handling this problem, a legal one and a statistical one. From a legal point of view, if the survey is compulsory, the non respondent might pay a fine. In practice, that can be done in a few cases but the formal mails sent to the business have a positive impact on the response rate.From a statistical point of view, many register surveys are conducted at the end of the year. Via the OCEAN tool, annual business surveys report the non respondent to the SIRENE register whose clerks have to check the active status within the scope of a specific Register Improvement Surveys.Sending publications (derived from the survey) to the businesses is also a way of persuading non respondents to answer.In the future, INSEE plans to : increase the number of face-to-face interviews for persistent non respondents ;
- use as early as possible tax data so as to reduce dead units follow-up ;
- design an econometric model to estimate the death probability.
Discussant: LOUIS BOUCHER, Statistics CanadaFloor Discussion
21. CURRENT TOPICS IN SEASONAL ADJUSTMENTInvited Paper Session
Organizer: STUART SCOTT, Bureau of Labor Statistics
Chair: BENOIT QUENNEVILLE, Statistics Canada
Signal Extraction in Time Series: A Fully Automatic Procedure. AGUSTIN MARAVALL, Bank of Spain and VICTOR GOMEZ, Spanish Ministry of Economics and Finance
We intend to present two programs, Tramo ("Time Series Regression with ARIMA Noise, Missing Observations and Outliers") and Seats ("Signal Extraction in ARIMA Time Series").
Tramo is a program for estimation and forecasting of regression models with possibly nonstationary (ARIMA) errors and any sequence of missing values. The program interpolates these values, identifies and corrects for several types of outliers, and estimates special effects such as Trading Day and Easter and, in general, intervention variable effects. Fully automatic model identification and outlier correction procedures are available.
Seats is a program for estimation of unobserved components in time series following the so-called Arima-model-based method. The trend, seasonal, irregular, and cyclical components are estimated and forecasted with signal extraction techniques applied to ARIMA models. The standard errors of the estimates and forecasts are obtained and the model-based structure is exploited to answer questions of interest in short-term analysis of the data.
The two programs are structured so as to be used together both for in-depth analysis of a few series (as presently done at the Bank of Spain) or for automatic routine applications to a large number of series (as presently done at Eurostat). When used for seasonal adjustment, Tramo preadjusts the series to be adjusted by Seats.
Detection and Modeling of Trading Day Effects. RAYMOND SOUKUP and DAVID FINDLEY, U.S. Bureau of the Census
The results of a study of various models for trading day effects in flow series (monthly accumulations) are summarized. We consider three types of diagnostics for detecting trading day effects and selecting the most parsimonious trading day model: spectral analysis, AIC comparisons, and out-of-sample forecast errors. We also describe merits and limitations of each of these three methods and provide a summary of the model preferences of each diagnostic for a set of time series published by the Census Bureau.
Comparison of Alternative Variance Measure for Seasonally Adjusted and Trend Series. DANNY PFEFFERMAN, Hebrew University, STUART SCOTT and RICHARD TILLER, U.S. Bureau of Labor Statistics
Variance measures for X-12-ARIMA (X-11-ARIMA) and model-dependent estimates of seasonally adjusted and trend components will be presented. For X-12-ARIMA, comparisons will be made among the methods of Pfeffermann & Scott and Bell & Kramer and the bootstrap method of Pfeffermann & Tiller. The latter method applies in principle to any adjustment procedure, including, in particular, model-dependent methods such as Tramo/Seats. Of special interest in all the methods is the consideration of sampling error. Empirical comparisons will involve U.S. labor force series derived from the Current Population Survey, where Pfeffermann, Tiller, and Zimmerman have estimated sampling error autocorrelations based on separate panel estimates.
Discussant: ALISTAIR GRAY and PETER THOMPSON, Victoria U. of Wellington, New Zealand
Floor Discussion
25. BUSINESS REGISTERS: MAINTENANCE AND USESInvited Paper Session
Organizer: NORMAND LANIEL, Statistics Canada
Chair: JULIE TRÉPANIER, Statistics Canada
The U.S. Bureau of Labor Statistics Longitudinal Establishment Database. RICHARD CLAYTON, MICHAEL SEARSON and KENNETH ROBERTSON, U.S. Bureau of Labor Statistics
In this paper we describe several key processes used in the construction of the Bureau's Longitudinal Database (LDB). This business register serves several functions. First, the register is utilized as a rich and comprehensive source of information on employment and wages, with information available by geographic area, industry, employment size class, and by many other characteristics. Second, the file is used as an establishment sampling frame for the Bureau's establishment surveys. Finally, the database is used to conduct longitudinal studies of business and employment decline and growth.In addition to a review of the overall program, its sources and outputs, we describe the following topics. The processes required to implement a classification change, from Standard Industry Classification (SIC) codes to North American Industry Classification System (NAICS) codes. The recent improvements in the record linkage methodology utilized to identify and maintain the continuity of establishments across time. This is of particular importance when identifying establishment births and deaths, and for economic analysis of job creation and destruction. We discuss the implementation of Permanent Random Numbers for sampling. Other topics to be included are the editing of microdata, and the data sources. Finally, we describe the Annual Refiling Survey, which is used to review and update, if necessary, one third of the registers' industry and geographical codes each year.
Strategies at the Australian Bureau of Statistics for Using Imperfect Registers Effectively. GEOFF LEE and PAUL SUTCLIFFE, Australian Bureau of Statistics
When a business survey measures movement in an economic indicator, the movement measured can be due to one or both of two things: the average activity per firm changing and the number of active firms changing. For good survey estimates, the sample must represent the changing population as well as possible, and the population benchmarks must be as accurate as possible.This paper describes the ABS's strategies for maintaining and using it's business register. They include work on coverage, maintaining the structure of large businesses, confirming business details during data collection, and procedures (standardised across all ABS collections) for detecting unavoidable discrepancies between the register and the real world and making compensatory adjustments to survey estimates. The demographics of the population of businesses, (including those not yet loaded to the register) are monitored, and consistent and up to date benchmarks are derived for the reference period for all ABS surveys using the business register.
Analogies are drawn with household surveys, where is common practice to pay particular attention to the benchmarks applied in estimation. This is also important for business surveys aimed at measuring movement. This ensures administrative irregularities in population updating processes do not give rise to additional volatility in the series.
Recent Developments in the Statistics Canada Business Register. ÉLAINE CASTONGUAY and ANDRÉ MONTY, Statistics Canada
In June, 1997, two major changes were implemented in the Business Register maintained at Statistics Canada. The first change was the introduction of the North American Industry Classification System (NAICS). NAICS is a new classification system developed in cooperation by Canada, the United States, and Mexico to achieve more comparable measurement of economic activity between the three countries. The aim of NAICS is to classify, not enterprises, which may be very diverse in their range of economic activities, but rather establishments. The paper will outline the procedures used to perform the massive conversion of the Business Register to NAICS.The second change was the conversion of the Business Register to the Business Number (BN). The BN is a unique identifier assigned to businesses by Revenue Canada and used to consolidate four programs, namely the incorporated tax accounts (T2), the goods and services tax accounts (GST), the payroll deduction accounts (Paydac), and the import/export accounts. Prior to June, 1997, the Business Register was comprised of employer businesses only and used Paydac accounts to define its population. With the introduction of the BN came the ability to link data from the four Revenue Canada programs. It was then possible to consider expansion of coverage to non-employer businesses. The paper will discuss the expansion in the coverage of Canadian businesses on the Business Register and the three phases through which the expansion was accomplished. As well, the problems accompanying the introduction of the BN (for example duplication, business units not linked to a BN) will be discussed.
Discussant: PETER STRUIJS, Statistics NetherlandsFloor Discussion
26. THE CONTINUOUS IMPROVEMENT APPROACH TO EDITING: HOW TO IMPROVE DATA QUALITY BY EDITINGInvited Paper Session
Organizer: LEOPOLD GRANQUIST, Statistics Sweden
Chair: RON FECSO, U.S. National Science Foundation
Some current approaches to editing in the ABS. GEOFF LEE and KEITH FARWELL, Australian Bureau of Statistics
A number of ABS collections have already implemented 'significance editing'. This rationalises resource use by prioritising editing activity in a number of collections based on the impact on the final estimates. More recently, considerable attention has been given to the management of 'respondent load'. Taken together, a logical model for processing built around managing respondent contact has evolved. It is quite distinct from more traditional views of a sequential processing system, involving despatch and collection control, non-response follow-up, data capture, input editing, and output editing and finally analysis. The new model is well suited to a TQM approach to the production of statistics.
The Agriculture Section is currently designing a new processing system, which will embed these concepts in an integrated processing system. The Agricultural Commodity Collection is being conducted as a sample survey for the first time and many 'traditional' approaches to processing and procedures are being revamped.
The paper will present the results of ABS's investigations and experience developing a practical implementation of these concepts. The experiences will in turn help shape a set of general editing measures, which at ABS we plan to incorporate into a generalised significance editing module for other collections.
Statistics Sweden's Editing Process Data Project. SVEIN NORDBOTTEN, University of Bergen and Statistics Sweden
Editing is a process aimed at improving the quality of the statistical products from large surveys. International research indicates that in a typical statistical survey, the editing may consume up to 40% of all costs. It has been questioned if the use of these resources spent on editing is justified.Three tasks must be solved to obtain answers to the above question:
First, process variables which influence, are part of or produced by editing processes must be specified and interrelated within a conceptually consistent framework.
Second, survey data on which analysis can be based must be obtained. Real observations collected from one or several surveys, or synthetic observations with assumed characteristics generated from an imaginary survey can be used. Both types of data have analytical advantages and drawbacks..
Third, the survey data are submitted to one or several alternative editing processes, the process data derived and analyzed, and the editing processes evaluated. A system for simulating the editing, analysis and evaluation is envisaged.
Statistics Sweden has embarked on a project to study editing processes, their impact son other statistical processes and on the statistical products. The report will present the basic approaches, review the work done so far and outline plans for future work.
Design of Inlier and Outlier Edits for Business Surveys. DAVID DESJARDINS, JAMES HUNT and WILLIAM WINKLER, U.S. Bureau of the Census
Establishment surveys can be challenging to edit. Some of the conventional editing methods have involved review of printouts and subsequent correction of fields in records that are thought to be erroneous. A limitation of these conventional methods -- even when well-designed -- is that they channel the reviewers in a manner that may not allow a number of the errors to be found. New software packages make it very straightforward to apply graphical based methods. The graphical methods can be applied in an exploratory manner to discover nuances that conventional methods are likely to miss. Further, the graphical methods allow confirmatory review of data to assure that corrections due to the editing process have worked well. This paper shows how a series of well-designed graphical methods can be developed and used to explore and edit the data. Most graphical methods allows detection of outliers in distributions that may be in error. Some erroneous data may lie in the interior of a distribution and be difficult to detect. These interior data points are called inliers. Isolated inliers may not be a problem and may be almost impossible to distinguish from correct data. Sets of inliers of moderate size may seriously affect uses of microdata. In some situations, we can use graphical methods to discover these sets of inliers. In other situations, we may use conventional methods for finding mixture distributions to locate and, possibly, correct the data. In more advanced situations, sets of inliers may arise when two or more administrative lists are linked and some of the identifying information is in error. If corrections are done, then we can use the graphical methods to confirm the plausibility of the changes.
Discussant: GUILIO BARCAROLI, Italian Statistical Institute (ISTAT)Floor Discussion
27. HUMAN SIDE OF DATA DISSEMINATIONInvited Paper Session
Organizer: CATHRYN DIPPO and FREDERICK CONRAD, U.S. Bureau of Labor Statistics
Chair: CATHRYN DIPPO, U.S. Bureau of Labor Statistics
The Internet has changed citizen access to all types of information, including statistical information produced by large statistical agencies. All the dimensions of accesswho, what, when, how, and whyare causing the producers of statistical information to evaluate how well their products are meeting the needs of their current and potential new users. In this session, the authors, who are information scientists or psychologists, will discuss their research over the last three years into various aspects of citizen access to statistical information.
Intermediaries as Users of Statistical Data. CAROL HERT, Syracuse University
Intermediaries, or people who assist others in their information seeking processes, form an important component of systems designed to provide access to statistical data. They not only find and use data themselves but add value for other users by explaining data, helping frame queries,etc. This paper presents findings about the expertise, knowledge, and resources used by intermediaries in statistical settings, and suggests how interfaces and systems might incorporate some aspects of intermediaryservices.
Interfaces to Support Customized Views and Manipulation of Statistical Data. GARY MARCHIONINI, University of North Carolina
Bad interfaces to government information are perhaps even more debilitating than lack of access because they add frustration and wasted time to citizens access efforts. This work addresses two particular interface design issues for statistical web sites. First, it aims to bring users closer to useful data with a minimum of mouse clicks. Second, it aims to provide alternative entry points to statistical data so that people with different needs and experiences can benefit from the same interface. In addition to these design goals, the work aims to integrate access to multiple statistical agency sites. Iterative designs informed by user testing will be discussed to illustrate how these goals were met.
The Role of Knowledge Representation in Managing Statistical Information. STEPHANIE HAAS, University of North Carolina
The importance of knowledge representation in understanding and managing complex information resources has gaining increasing recognition. A clear, unambiguous representation of concepts, rules, sources, ranges of values, etc., is fundamental to the organization and presentation of information in an intelligible and useful way. Providing citizen access to statistical information is a particular challenge, in that users have a wide range of expertise and interest to support them in their quest for answers. The knowledge representation can guide system designers in determining what kind of information to provide and how best to do it, and can also serve as a resource for the users themselves, showing them how the statistical world is put together.
Usability Testing of Data Extraction Tools. FRED CONRAD and JOHN BOSLEY, U.S. Bureau of Labor Statistics
The ability to extract customized data sets from agency web sites is a double edged sword. On the one hand it gives data users great control over the information they obtain. On the other hand, it requires they know a lot about how the data are structured and named. We report several usability studies of two web-based data extraction tools. In early versions of these tools, users had to submit multiple forms in order to build up a query. More recent versions involve client-side applications. While this evolution appears to ease certain usability problems, several types of problems persist. For example, users continue to have trouble locating and interpreting variable names. Additionally, users who would like to preview a data set before extracting it are frustrated in their efforts.
Floor Discussion
28. COMBINING SURVEY AND ADMINISTRATIVE DATAInvited Paper Session
Organizer: EVA ELVERS, Statistics Sweden
Chair: PAM DAVIES, Office for National Statistics U.K.
Use of Administrative Data as Substitutes for Survey Data for Small Enterprises in the Swedish Annual Structural Business Statistics. JOHAN ERIKSSON and LENNART NORDBERG, Statistics Sweden
This paper presents the Swedish experiences from using administrative data instead of survey data for statistical purposes. The following issues are discussed:
- A brief summary of the methods used in Sweden of combining survey data and administrative data and the experiences of using these methods.
- Administrative data and frame errors: From the use of administrative data, several problems with the frame for Structural Business Statistics have been discovered and possibilities of correcting the frame for coverage problems have been improved.
- Presentation and breakdowns: Since the Structural Business Statistics is now a total enumeration, more detailed presentation is possible. Regional breakdown and key ratios for small enterprises are two areas where this is important.
- Estimates of items such as investments that were weakly correlated to the sampling indicator have now been improved by the use of administrative data.
- Since administrative data do not always include all items important to the survey, a combination of data from questionnaires, administrative registers and other statistical surveys must be used to estimate the missing items. Some Swedish experience on this matter is presented.
Use of Business Income Tax Data to Extend the Information Available from the ABS Economy Wide Economic Activity Survey. STEVE CRABB and PAUL SUTCLIFFE, Australian Bureau of Statistics
The Australian Bureau of Statistics (ABS) is committed to increasing the range and quality of the statistics it provides to users, while at the same time reducing the respondent burden placed on businesses. One important initiative in this regard is the supplementation of the economy wide Economic Activity Survey conducted by the ABS with business income tax data provided by the Australian Taxation Office (ATO) for selected industries. The statistics for these selected industries have been improved by the use of business income tax data for a large super sample of small and medium employing businesses, and by extending the coverage of the survey to include non-employing businesses for which business income tax data is also available.The methodology involves the use of ABS collected data for the relatively few large and complex businesses, supplemented with ATO collected data for the many small and medium sized, simply structured businesses. A subsample of these small and medium sized businesses are also approached by the ABS to obtain data not available from the ATO. The paper presents details of the methodology used to compile the statistics, together with some of the hurdles that needed to be resolved and some of the challenges that still need to be overcome.
New Applications of Old Weighting Techniques. ALBERT HENDRIK KROESE and ROBBERT HANS RENSSEN, Statistics Netherlands
Statistics Netherlands is heavily investigating the possibility to exploit administrative data for statistical purposes. In fact its policy is to replace own data collection by the use of administrative data where possible. At current Statistics Netherlands is in the midst of such a transformation: many surveys are being or will soon be redesigned to maximally exploit administrative data sources.In this paper a recently developed estimation approach is presented to combine register and survey data. The method uses (exact) record linking and repeated weighting in order to obtain an extended set of model assisted (i.e. design based) estimates that are totally consistent with each other as well as with all possible aggregations of the register variables involved. Using so-called minimal weighting models, which can automatically be generated, the approach combines the attractive statistical properties of model assisted estimates with total coherency of all statistical estimates, both from the surveys and the registers involved.
In the paper the method is applied to enterprise statistics. Different totals are estimated using available registrations and survey data.
Discussant: PIERRE LAVALLÉE, Statistics CanadaFloor Discussion
31. COORDINATING SAMPLING BETWEEN AND WITHIN SURVEYSInvited Paper Session
Organizer: PATRICK CANTWELL, U.S. Bureau of the Census
Chair: PATRICK CANTWELL, U.S. Bureau of the Census
In this session, we present various aspects of coordinated sampling based on experience from surveys in Australia, Canada, and Sweden. The papers include technical developments as well as the perspectives of the project (survey) manager. There are three papers and a discussant.
Synchronized Sampling. RICHARD MCKENZIE and BILL GROSS, Australian Bureau of Statistics
Synchronised Sampling is a technique which has been used by the Australian Bureau of Statistics since 1983 to control sample rotation within surveys and overlap between surveys. It relies on assigning a permanent random number to each unit on the business register and selecting units whose random numbers lie in an interval. Rotation control is achieved by moving the interval to the right and overlap is achieved by, for each stratum, constraining the selection interval to move within an overlap range. Key properties of the method are:
- it controls overlap between a number of surveys
- it achieves pre-assigned sample sizes in each stratum
- it gives nearly equal probability of selection within strata
- it copes with different stratification in different surveys
- it controls rotation by guaranteeing a maximum time in sample
- it can control overlap for single-establishment firms between an establishment survey and a firm survey
- it allows for births and deaths on the frame
- it generally does not control rotation or overlap when units change strata
The paper describes the basic algorithms for synchronised sampling and the techniques used to set overlap ranges and selection intervals, especially after major changes in stratification.
Issues in Co-ordinated Sampling at Statistics Canada. DON ROYCE, Statistics Canada
Until the mid-1990s, the environment for business surveys at Statistics Canada was not conducive to the use of co-ordinated sampling. With the arrival of the Project to Improve Provincial Economic Statistics (PIPES), however, the situation began to change. One of these changes has been the development of a new Unified Enterprise Survey (UES), which integrates the frames and the sample designs of many of our annual business surveys. At the same time, however, the additional survey activity resulting from PIPES has required an increased focus on the issue of response burden, particularly for small businesses.These developments have raised many questions for Statistics Canada concerning the future use of co-ordinated sampling. These issues include the need to control the overlap between annual and sub-annual surveys of the same population, the need to control overlap between the UES and other all-industry surveys, the need to control overlap of different surveys of the same population, and the need to rotate the sample among successive occasions of the same survey. Some related initiatives are also described, such as the use of special measures to reduce response burden on small businesses, the development of a respondent burden tracking system, and the development of the Generalized Sampling System as a tool for standardization of sample selection methods.
Coordination of PPS Samples Over Time. ESBJÖRN OHLSSON, Stockholm University
Applications of probability proportional to size (PPS) sampling can be split into two categories. The first is multi-stage sampling, where PPS sampling is often employed at the primary stage. Here, the within-stratum sample sizes are often small, sometimes even of size one. The second category is business surveys using a list frame and having a strong correlation between size and target variables. Here sample sizes within activity strata are typically moderate or large.In both cases there is a need to update samples from time to time, while retaining as many units as possible from the old sample. Further requirements on the sampling procedures are simplicity in application and in estimation of variance. In the second category there is also a need for coordination between surveys to get an even distribution of response burden.
We present various permanent random number techniques that meet these requirements and compare them to a few other methods, and present a simulation study on expected overlap. We briefly discuss applications to the first-stage sample of "stretches of roads" for the traffic flow survey at the Swedish National Road Administration and to the list frame sample of the Swedish CPI.
Discussant: LAWRENCE ERNST, U.S. Bureau of Labor StatisticsFloor Discussion
32. GENERALIZED INTEGRATED PROCESSING SYSTEMSInvited Paper Session
Organizer: RAY FREEMAN, Statistics New Zealand
Chair: GEORGE ANDRUSIAK, Statistics Canada
An Overview of the Standard Economic Processing System (StEPS). SHIRIN A. AHMED and DEBORAH L. TASKY, U.S. Bureau of the Census
The economic area at the U.S. Bureau of the Census conducts 110 current surveys in the areas of retail, wholesale, service industries, manufacturing, and construction. Prior to 1995, subject areas directed the development of systems to accommodate specific program needs. Over time, this resulted in the economic area having 16 different processing systems, and variations of each.
In May 1995, the economic area dedicated a team to build a common survey processing system. The team comprised survey statisticians, programmers, and mathematical statisticians. The system they developed is known as the Standard Economic Processing System, or StEPS. In four years the team completed the basic StEPS system. During 1999, 50 annual surveys used StEPS to collect and process data for the 1998 statistical year. Note, three of the surveys served as StEPS pilots in the prior year. In the next three years, the remaining surveys move to StEPS. At that time, approximately 400 analysts in Washington, D.C., and 80 clerks in Jeffersonville, Indiana (the Bureaus processing center) will use StEPS.This paper describes what StEPS is, how it is general, and strategies undertaken to accelerate development and implementation. The paper includes details about data set structures, modules and linkages to other systems.
Statistics New Zealand's Survey Processing System: A Template for the Re-engineering of SNZ Survey Processes. RAY FREEMAN, Statistics New Zealand
The need for a generalised survey processing system has been apparent in Statistics New Zealand for some time. Key requirements were the ability to rotate processing staff between surveys, improving publication times, improving quality without adding to costs, reducing the risks of errors, as well as reducing development costs. Our solution was the development of a "standard" survey processing system template that could easily be reused to develop a system for a new survey.Sprocet ( Survey PROCEssing Template ) integrates all survey processing phases from data capture through to output editing using the same platform. At all stages management information can be produced which gives the staff involved and any other approved user in the organisation the ability to balance work flows and monitor progress against contractual agreements. The dynamic nature of views along with the use of categories, sorting and drill down, have integrated and transformed processes that were previously large batch jobs, into interactive processing.
Sprocet is written in Lotus Notes, which is also used for discussion databases and email. This has facilitated fully integrated desktop processing.
Sprocet is also very integrated with the Business Frame and the Business Surveys database of respondent information. Both of these are written in 'SQL Windows' and located on Sybase databases. The minimum of information is replicated onto Sprocet, the rest is retrieved dynamically during processing and changes identified during the survey process are updated back to the Sybase databases.
Discussant: CAROL HOUSE, National Agricultural Statistics Service, USDAFloor Discussion
Invited Paper Session
Organizer: HYUNSHIK LEE, Westat
Chair: HYUNSHIK LEE, Westat
Robust Multivariate Outlier Detection Using Mahalanobis' Distance Stahel-Donoho Estimators. SARAH FRANKLIN, MARIE BRODEUR and STEVEN THOMAS, Statistics Canada
This paper illustrates the practical application of a robust multivariate outlier detection method used to edit survey data. Outliers are identified by calculating Mahalanobis' Distance, where the covariance matrix is robustly estimated using an approximation of the Stahel-Donoho estimator. This method of multivariate outlier detection has been successfully employed at Statistics Canada for five years by the Annual Wholesale and Retail Trade Survey and more recently by the Monthly Survey of Manufacturing. The benefits of the method are threefold: first, it can identify outliers that violate the correlational structure of the data. This is particularly important in business surveys where subject matter experts wish to identify data that are suspicious relative to the values of other variables. Secondly, Mahalanobis' distance can be decomposed to identify which variable of the outlying record is most problematic. Thirdly, the program is easy to run since it only requires two input files: a parameter and data file.
An Evaluation of Outlier-Resistant Procedures in Establishment Surveys. JEAN-PHILLIPE GWET and HYUNSHIK LEE, Westat
Many outlier-resistant procedures have been proposed for the outlier problem in estimation. Only limited information is available on how they are compared in a wide range of situations that are expected to occur in establishment surveys. This paper gives much needed information to the survey practitioners on comparative performance of various outlier procedures in terms of MSE-efficiency, bias, coverage property of interval estimation, and easiness of use.We will consider various survey situations with respect to the sample design and estimator. In discussing outlier-resistant alternative estimators, we will confine ourselves to some of the most commonly used sampling designs. These are the simple random sampling, the probability proportional to size sampling, and the stratified simple random sampling designs. The discussion will focus particularly on the stratified simple random sampling, as it is very common in establishment surveys. Different estimators are available depending on whether there is auxiliary information available for estimation, which is different from the auxiliary information used for the sample design. Thus the estimators will be classified according to whether auxiliary information is used in estimation or not.
Based on the evaluation results some useful recommendations will be provided.
Winsorization for Identifying and Treating Outliers in Business Surveys. RAY CHAMBERS, University of Southampton; PHILIP KOKIC, Insiders GmbH; MARIE CRUDDAS and PAUL SMITH, Office for National Statistics, United Kingdom
Sample values that are considerably bigger than their stratum mean are not uncommon in business surveys. In most cases these values are correct. When carrying out estimation in such cases it is useful to employ an automated method with demonstrable theoretical properties rather than relying on human judgement to "modify" the outlier values. In the UK Office for National Statistics the automated method of choice is winsorisation, which can be viewed as altering the value of an extreme observation or as altering its sample weight so that it has less effect on the estimate. This leads to an estimator with good mean square error properties in repeating surveys.This paper reviews the theory underpinning two approaches to winsorization in survey sampling; the first, one-sided winsorisation, only adjusts the value/weight of observations that are significantly larger than their expected values under the fitted estimation model. The second, two-sided winsorization, adjusts values/weights that are significantly different from their expected values. Both approaches rely on tuning parameters to define what "significant" means, and typically estimate these from historical data.
The paper also covers practical issues which must be considered in using
winsorisation, and reports some of the experiences obtained in applying the methods.
Discussant: BEAT HULLIGER, Swiss Federal Statistical OfficeFloor Discussion
34. TECHNOLOGY DRIVEN CHANGES IN DATA COLLECTION AND DISSEMINATION OF ESTABLISHMENT DATAInvited Paper Session
Organizer: DANIEL KASPRZYK, U.S. National Center for Education Statistics and SAMEENA SALVUCCI, Synectics for Management Decisions, U.S.
Chair: SAMEENA SALVUCCI, Synectics for Management Decisions, U.S.
As we approach the end of the nineties, we have witnessed significant changes in computer technologiesa shift from text-based tools to direct manipulation of data through graphical user interfaces (GUIs) and a shift from limited dissemination to world-wide dissemination through the Internet. Government statistical agencies have been the catalysts for innovatively using these new technologies for collecting and disseminating establishment data. If past is prologue, then we should expect an even greater technological evolution and even more rapid change in the new millenium. We need to start planning now to effectively understand and harness these advancing technologies for statistical usescollection, dissemination, and use of data.This session will include papers which will provide a comparative perspective on the challenges faced in utilizing advanced technology for establishment surveys collected by government agencies in three different countries with widely differing economic and social circumstances. The sectors addressed across these papers will include energy, businesses, and education. Issues to be discussed include the challenges of privatization, training, collection, validation, dissemination, confidentiality, and uses of data.
We anticipate that these presentations will generate further discussion by session participants who would contribute by describing their own disparate experiences in facing economic and social challenges in the use of cutting edge technologies in the collection and dissemination of establishment data.
An Architecture for Integrating the Collection, Analysis, and Dissemination of Establishment Data. JOE ROSE, U.S. National Center for Education Statistics
; SAMEENA SALVUCCI and RICHARD HOUTARY, Synectics for Management Decisions Inc.
The U.S. Department of Educations National Center for Education Statistics has recently moved to a Microsoft IIS Active Server Platform and is beginning to develop online applications that will allow for the integration of the collection and dissemination of its establishment data. This paper will describe the lessons learned during the process of developing a prototype of an Internet-based system that will be used to administer and disseminate a new Congressionally-mandated Institution Prices and Student Financial Aid (IPSFA) survey. The IPSFA is part of the Integrated Postsecondary Education Data System (IPEDS) which is the core postsecondary data collection program in NCES. The IPEDS system is built around a series of interrelated surveys to collect institution-level data in such areas as enrollment, program completions, faculty, staff, finance, and libraries.The paper will also address the lessons learned during the process of developing prototypes for an NCES data warehouse and online data analysis tool.
Collection and Uses of Educational Data in Ethiopia. NEBIYU TADESSE, USAID, Ethiopia
The socialist system was abolished in Ethiopia in May 1991. The country moved to a decentralized system of government two years after the collapse of the communist regime. Under the decentralized government system, several regional or state governments were established. The Regional governments have their own parliament constitutionally empowered to administer their regional activities. The federal government provides money to the regions in the form of block grants in which the regions are free to spend the money in the way they like. The states or regions are expected to raise money locally. The regions right to manage their own affairs include collection of data and information management.This paper discusses the school data collection mechanisms and its use in Ethiopia with particular reference to the Southern Nations, Nationalities, and Peoples Region (SNNPR). Data are collected by surveying educational institutions through questionnaires as well as using the lately introduced mapping technology with the assistance of Geographical Positing System (GPS) as well as the Geographical Information System (GIS) and other software. The complexity of this endeavor was challenging. For example, the ethnic and language characteristic of SNNPR (population around 11 million) is the most distinguishing factor from the rest of the regions of Ethiopia. There are 59 mother tongues in use in SNNPR region. Eight languages are used for instructions and ten languages are studies as academic subjects.
Collection and Delivery of Government Information Through the Internet - The Chinese Case. XUESHAN YANG, State Information Centre, Beijing and GARY HUANG, Synectics for Management Decisions Inc
The Chinese government institutions are undergoing a drastic change in information collection and dissemination. Traditionally, China was extremely restrictive to releasing government information to the public. Over the years, the central government tightly controlled the scope and the channels to release key statistics in social, demographic, and economical changes. Many data crucial for understanding the nations development were not available for scholarly analysis, let alone for public scrutiny. Amid the government restructuring in 1998, a number of agencies at central level and the provincial level have begun publishing government information on their Web sites. This new approach to disseminating public information is becoming more widespread. It is foreseeable that in China, Internet-based institutional information collection and delivery will grow in the future, both in the coverage of the content and the number of the agencies involved.The proposed presentation will describe this ongoing change in China and explore the implications for policy and program development in China and elsewhere in the developing world. The first section of the paper will give a brief introduction to the development of Internet in China, addressing background issues such as the current economic reform and government restructuring, the different roles of governments, private sector, and academics regarding information services, technological challenges, and the dilemma involving information access and security.
In the second section, the up-to-date status of the government information on-line services will be examined. Examples will be provided to illustrate the scale and the depth of the changes that are taking place in Chinas institutional information management. A list of the Chinese government Web sites will be presented and their future developments will be discussed.
The third section will discuss in detail about the motivation to and the processes of Chinese government information services through Internet. It will examine a conceptual framework in which potential opportunities and problems can be analyzed for better understanding the options for China and other developing nations promised by the dazzling changes in information technologies. Policy and programmatic issues relating to domestic development and international cooperation will be examined.
Discussant: MINDY REISER, Synectics for Management Decisions, Inc.Floor Discussion
37. PRACTICAL EDITING AND IMPUTATION STRATEGIES FOR STATISTICAL SURVEYSInvited Paper Session
Organizer: CLAUDE POIRIER, Statistics Canada
Chair: CLAUDE POIRIER, Statistics Canada
The invited session will present recent developments related to editing and imputation in statistical surveys. Practical concerns observed in the application of E&I techniques will especially be discussed. The following four presentations will give the audience the current picture of applied techniques around the world. Variations of traditional methods of micro and macro-editing associated with quantitative and qualitative variables will be presented. Comparisons with the neural network approach will also be done.
Economic Census General Editing-Plain Vanilla. DENNIS WAGNER, U.S. Bureau of the Census
The U.S. Census Bureau decided to develop a prototype general edit for the 1997 Economic Census, to see if it could replace the four trade specific edits being used. Plain Vanilla, as the prototype was called, was designed to perform two major editing functions for the economic censuses, data relationship edits using SPEER methodology and balancing details to totals.Plain Vanilla was developed by a dedicated team of Survey Statisticians, Mathematical Statisticians, and Programmers. It integrated with the trade specific edit systems by incorporating metadata on the edit record structure, directory names, and file names along with a script which specified specific tests and imputation strategies for failures.
Plain Vanilla provided its functions as a replacement to similar functions done in three of the four trade specific edit systems in the 1997 Economic Census. Reaction was mostly favorable and significant time and resources were saved using the general system.
For the 2002 Economic Census, the Bureau plans to expand Plain Vanilla to a full featured general edit system to replace the four trade specific systems.
A Generic Implementation of the Nearest-Neighbour Imputation Methodology (NIM). MICHAEL BANKIER, PAUL POIRIER, MARTIN LACHANCE and PATRICK MASON, Statistics Canada
A New Imputation Methodology (NIM) was introduced in the 1996 Canadian Census to carry out the hot deck Edit and Imputation (E&I) for the demographic variables. For the first time, qualitative and quantitative variables could be imputed simultaneously. The NIM used a data-driven approach with single-donor imputation to determine the best imputation action. For the 2001 Canadian Census, the NIM will be extended to more variables in accordance with the long-term objective to progressively move all census variables to the NIM for the 2006 Canadian Census. This expansion of the NIM use provides the opportunity to make a generic implementation of the methodology. The generic implementation will have new features under consideration, such as the evaluation of many quantitative variables in a single edit rule, the option of using derived variables and also, the capability of processing non-linear edit rules. The flexibility of the generic implementation will open the possibility of using the NIM in a wider variety of surveys.
Graphical Macro Editing: Possibilities and Pitfalls. FRANK VAN DE POL, TON DE WAAL and ROBBERT RENSSEN, Statistics Netherlands
National Statistical Institutes spent many resources on data editing. Improving the data editing process is therefore a very important issue for NSI's. Modern data editing techniques, such as selective editing, automatic editing, graphical editing and macro-editing, can substantially reduce the costs of data editing, while improving the timeliness of the survey and maintaining the quality of the data.The paper starts by providing a brief overview of software tools for graphical editing (macro-editing). After that overview the paper focuses on a software tool, MacroView, that has been developed by Statistics Netherlands. MacroView can be applied to a broad class of economic surveys. It consists of three levels. At the macro-level the publication figures of the present period are compared to those of a previous period. This allows the detection of suspicious publication figures. At the meso-level, multivariate outliers can be detected. Scatterplots of selected variables can be made in order to detect outliers visually. Alternatively, mathematical algorithms for robust multivariate outlier detection can be used. At the micro-level values in individual records can be corrected.
In the paper we point out the advantages of using a software tool such as MacroView. We also point out the dangers of macro-editing in general.
The Uses of Neural Network in Data Editing. BIRGER MADSEN and BJORN STEEN LARSEN, Statistics Denmark
The paper gives an overview of experiments performed at Statistics Denmark about the uses of Neural Networks (NNs) in Data Editing. This covers error localisation, error correction and imputation of missing values. A comparison with classical linear and logistic regression is done. The following topics are covered:
- Which types of NNs are most advantageous for data editing and imputation?
- Are NNs best suited for error localisation procedures or for error correction and imputation?
- Which types of data errors are most successfully located and corrected by NNs?
- Which types of data are most amenable to data editing and imputation using NNs?
- At which stage of the data editing process, should NNs be applied?
Data covers both nominal, ordinal and interval data. Both "business" data and "individual/household" data are covered.The most frequently used type of NNs are the Multi Layer Perceptron (MLP). Recommendations for choosing the number of hidden layers, the number of neurons in each layer, the choice of activation function and training algorithm are given.
The whole question of type of pre-processing of data (including feature extraction and selection) is also briefly covered.
Floor Discussion
38. QUALITY IN BUSINESS SURVEYSInvited Paper Session
Organizer: PAUL SMITH, Office for National Statistics, United Kingdom
Chair: PASCAL RIVIÈRE, Institut National de la Statistique et des Études Économiques (INSEE)
The issue of quality in business surveys is an important one, since many decisions in macroeconomic management are taken on survey information, and at times it may be difficult to reconcile the messages contained in different indicators. It is helpful to have some strong indication of which information to place more reliance on. The same issue is important in NSIs for the allocation of resources to different surveys, dependent on their relative accuracy and the usefulness of the series in policy, the national accounts or other areas. At the same time the accuracy of lower-level information used by a multitude of smaller organisations and individuals may need to be maintained so that confidence in the NSIs outputs is supported. To this end we need quality measurement and quality monitoring procedures in place, and developing these procedures will often have particular challenges for business surveys which are different from those described in the parallel and better-researched social survey area. This session brings together some recent developments in quality measurement and quality assurance in business surveys.
Assessing the Quality of Business Statistics. PAM DAVIES, Office for National Statistics, United Kingdom
The paper/talk will discuss the conclusions of a project on quality measurement undertaken jointly by ONS and Statistics Sweden and the Universities of Bath and Southampton in the UK. The project was funded by Eurostat and takes forward work, at the European level, to agree on a definition of quality in business statistics and a framework for quality assessment. The components of quality considered within the project are accuracy, comparability and coherence. The outputs from the project are (a) tools (methodology in general, software for specific elements and guidelines for implementation) which could be used in other Member States and (b) model reports showing what can be done. The model reports have assessed the quality of a structural and short-term survey in the UK and in Sweden.
Quality Assessment in Swedish Business Statistics. EVA ELVERS, Statistics Sweden
After a short introduction to the user-oriented quality concept for official statistics in general, the discussion concentrates on the quality components accuracy and coherence. One of the characteristics of business statistics is quick population changes due to births, deaths, and different types of organisational changes. Care in unit delineation is essential. Knowledge about units and their relationships is needed in business registers, in data collection and editing, when making comparisons on the micro level etc. Some quality measures can be obtained when the statistics are produced, at least considering probability sampling and indicators for non-response and coverage. Later, more information is available, for example from administrative sources and the business register. That can be used for comparisons on micro and macro levels. Over- and under-coverage can be identified, at least partly, and the effects can then be estimated. A short-term survey can derive approximate quantitative measures of quality by comparison with a related annual survey, for example about non-response and measurement errors. Such quality assessments are useful not only for the users of the statistics, but also as feed-back to surveys.
A Quality Approach to Business Surveys. PATRICIA WHITRIDGE and STUART PURSEY, Statistics Canada
Over the last few years, Statistics Canada has initiated a project that will ultimately affect all major economic surveys. One of the long term objectives is to be able to present an integrated product to our users. The surveys are being unified to use the same sampling frame, an integrated sampling and estimation strategy as well as common collection and processing systems.To provide base information about the quality of existing business surveys, Statistics Canada has undertaken a Data Quality Survey to collect information about all aspects of survey methodology, from frame to estimation.
As the surveys are being redesigned and unified, efforts are being made to ensure sufficient data are available to facilitate evaluation of the quality of all processes as well as to make some judgements about the quality of the final product. A composite quality indicator has been proposed as a tool to express the overall quality of the products.
An overview of the culture of quality at Statistics Canada will be presented. Details of the new initiatives will be discussed. Some conclusions about the effectiveness of the approach being adopted will be drawn, as well as future directions for more work.
Discussant: MARY MARCH, Statistics Canada and EMMANUEL RAULIN, EurostatFloor Discussion
39. SURVEY ISSUES IN A CHANGING AGRICULTURE INDUSTRYInvited Paper Session
Organizer: CLAUDE JULIEN, Statistics Canada
Chair: FRANCINE HARDY, Statistics Canada
The objective of the session is to present how agriculture survey programs in Canada and the United States are addressing the challenges put to them by an agriculture industry that is evolving rapidly. The last 20 years have seen the agriculture industry still maintaining a very large household-base population while evolving more and more towards a typical business population with an increasing number of large farm operations (as individual enterprises or as part of a complex enterprise structure). Farms are more dynamic in terms of the commodities they produce (traditional production versus new specialized production) and the methods used to produce them (renting land, purchasing land, sharing equipment, etc.). The evolution of the agriculture industry is affecting every aspect of survey work associated to the agriculture program.The session would address the following issues:
- Applying more often business survey methods to an industry that is still largely household-based
- Defining, creating and maintaining up-to-date sampling frames; keeping track of changes to large and complex farming operations; identifying new farming operations between agriculture censuses
- Collecting data from the large and complex farming operations during censuses and surveys; methods and practices to reduce the response burden put on the large farming operations by agriculture surveys and other business surveys; respondent relations practices to obtain the information needed to maintain the current statistics program
Improving our Agricultural Statistics Program Through Easy and Fast Access by Employees to Previous Survey and Census Data. JACK NEALON, U.S. National Agricultural Statistics Service
Changes have occurred in the agricultural industry and the demand for agricultural data has increased, which has resulted in the need to collect "more" data from "fewer" farm operators. Due to the additional respondent burden and the challenge of maintaining high response rates and data quality, NASS is re-evaluating its entire survey process. Can response rates be improved by making better use of information from previous survey contacts? Can individualized questionnaires be developed to reduce respondent burden? Can we more effectively detect survey-to-survey data inconsistencies and measure change by making it easier to compare current and previous data responses? NASS believes the answer to these questions is "Yes", and the solution is to provide all employees easy and ready access to previous survey and census data to promote maximum use in the survey process.NASS has developed an integrated, easy-to-use, and high performance Data Warehouse System. This system already contains over one-half billion records of survey and census data from farm operators and is providing improvements to our survey process. More improvements are expected as the Data Warehouse is fully integrated with NASS's sampling, survey management, data collection, data analysis, and estimation systems.
The presentation will motivate the need for easy and ready access to data by employees, discuss the design of NASS's Data Warehouse System that promotes ease of understanding and high query performance, and describe the critical role the Data Warehouse will play in our future survey and census program.
Changes to the NASS Hog Survey Program to Accommodate Hog Industry Changes. BILL IWIG, MIKE PALLESEN and JOE PRUSACKI, U.S. National Agricultural Statistics Service
The hog industry in the U.S. has undergone dramatic structural change over the past two decades. Since 1980 the number of hog operations has dropped from over 650,000 to about 110,000 in 1998. While small family farm operations still account for a large proportion of existing hog farms, large operations account for a growing proportion of the inventory. In 1998, about 2 percent of the hog operations had 5000 or more hogs, but they controlled 42 percent of the inventory.In 1996 NASS made several changes to the hog survey program to accommodate these structural changes in the hog industry. These included the following.
- A separate quarterly Hog Survey was developed. Previously, hog data were collected as part of the Quarterly Agricultural Survey, which also collected crop acreage, crop production, and grain storage data.
- The target population shifted to owners of hogs, which includes contractors, rather than operators of hog farms.
- A full multiple frame survey, which utilizes the area frame to measure the incompleteness of the list frame, is used only for the base December survey. Since there is only about 6 percent incompleteness now, this portion of the estimate is modeled for the remaining quarters (March, June, and September).
These changes have helped provide more stable and reliable hog survey estimates in the last couple years.
Recent Developments in Maintaining Survey Frames for Agriculture Surveys at Statistics Canada. CLAUDE JULIEN, Statistics Canada
Following its most recent Census of Agriculture in 1996, Statistics Canada has redesigned all of its farm commodity and financial surveys as well as its agriculture survey infrastructure. All agriculture surveys now use only list frame extracted from a Farm Register. The area frame methodology used to complement the list frame has been dropped. This has put additional pressure on keeping the Farm Register up-to-date on the numerous changes taking place in the agriculture sector.The presentation will describe the methodological and operational aspects of the survey infrastructure that maintains the Farm Register. The survey infrastructure allows us to identify changes to the farms that are contacted and quickly record these changes on the Farm Register to be used for the next surveys. This, in turn, allows a more efficient use of resources and better control on respondent burden.
The survey infrastructure has been in place since September 1997 and a considerable amount of data has been accumulated. The presentation will highlight the valuable knowledge on the changes in the agriculture sector, as well as our experience on keeping up with these changes that have been gained. This experience will lead us to consolidating the strengths and addressing the weaknesses of the current methodology in view of our next redesign following the 2001 Census.
Research into Improving Frame Coverage for Agriculture Surveys at Statistics Canada. MIKE MILLER, ANN LIM and JEANNINE MORABITO, Statistics Canada
Frame coverage is an issue for the agriculture survey program at Statistics Canada. All agriculture surveys at Statistics Canada use list frames that are extracted from a central Farm Register. This register was updated after the 1996 Census. The area frame sample that used to cover farms that are missed by the census and farms that start operating after the census has been dropped. The survey infrastructure keeps up-to-date on the farms that are contacted by agriculture surveys, but very few farms have been added to the Farm Register since the last census. The resulting frame attrition impacts on the quality of the surveys.
This paper will present research on using various sources of administrative data to add farms to the Farm Register. A pilot study was conducted using tax data to identify candidate farm operations that started since the last census. New farms were identified, but many of the candidates contacted were out-of-scope. Further research has since been conducted into combining more sources in an attempt to improve on efficiency. Particular attention has been put on prioritizing the candidates in terms of their likelihood of actually being a farm, their potential size and their activity. The paper will conclude by presenting how this research is also important for improving and evaluating the coverage of our next Census to be conducted in 2001.
Floor Discussion
43. INTEGRATING AGRICULTURE AND FOOD STATISTICS: NATIONAL AND INTERNATIONAL PERSPECTIVESInvited Paper Session
Organizer: MICHAEL TRANT, Statistics Canada
Chair: MICHAEL TRANT, Statistics Canada
Integrating Agricultural Data for Analysis and Public Use. MARK ELWARD, Statistics Canada and THERESA HOLLAND, U.S. National Agriculture Statistics Service
Agriculture economists have been integrating agricultural data since the dawn of the theory of supply and demand. Consequently, integrated data programs often date back to the development of the statistical programs themselves because they rely on integrating agricultural data into structured formats.This paper will look at integrating agriculture and food data from three different perspectives:
- traditional statistics,
- structured accounts, and
- surveys
As the topic is broad, the paper will approach the subject from a highly summarized perspective.Traditionally, the integration of agriculture and food statistics has been an integral part of many agricultural statistical programs. Commodities have long been the focus of supply demand tables (balance sheets) used to analyze data, as a tool to set estimates or to display information. In many cases the commodity data flow to the economic data set so that estimates such as farm income or value added can be derived. Also, commodity data from different sources such as a census versus a seasonal production survey must often be integrated into a statistical program so that the public data does not differ because of different methods.
One way of integrating data is to develop a series of structured accounts that are logically integrated and provide more information because of the integration. A brief look at an integrated system of accounts including value-added, balance sheets, cash flow and producer income accounts will be presented.
Integrating data from different sources that use different concepts and methods often has some serious challenges that can be met, in part, by using surveys. The paper will review the National Agricultural Statistical Services experience with the Agricultural Resource Management Study and the experience of Statistics Canada in relation to the Unified Enterprise Survey.
The authors hope that examples from the Canadian and United States experience will illustrate different options for integrating data as well as the value of integrated agriculture and food statistics.
The Agriculture and Food Information System of Slovenia: Its Transformation and Development. IRENA ORENIK, MATEJA ROJC, METKA ZALETEL, SIMONA DERNULC, HELENA PUC, SLUGA MARIJA, JOJA KRZNAR, ANGELA JUVANC, Statistical Office of the Republic of Slovenia
In Slovenia the adaptation of agricultural statistics to national needs and EU standards, started in 1993. The introduction of a new program of agricultural statistics was based on a set of pilot surveys, used to identify the optimum collection strategy and estimate the budget requirements. The main challenges were the design and development of a new list frame (farm register) and the identification of an objective, cost effective method of data collection. These two issues have now been resolved and a major part of the new program has been carried out since 1997.Relatively undeveloped administrative structures and the high proportion of farms producing only for their own consumption forced the development of an agriculture statistics system mainly based on statistical surveys. The use of data from administrative sources is limited.
However, in parallel to the Slovenian statistical survey harmonisation project, there is also the Integrated Administrative and Control System (IACS) project. IACS is an information system to support subsidy payments and market interventions in the frame of Common Agriculture Policy (CAP). With full implementation of IACS, administrative data could be substituted for survey data and some of the statistical surveying could be reduced.
Agricultural statistics are becoming an integrated part of the Slovenia national food and agriculture information system, addressing the needs for individual micro-data, frame maintenance, CAP requirements as well as aggregate statistics for national accounts purposes.
Data Integration and its Role in the Development of Better Agriculture and Food Information Systems. LADISLAV KABAT, L. NAIKEN and P. NARAIN, Statistics Division of the Food and Agriculture Organization
Recently agricultural activity has undergone considerable change. Today agriculture involves a large number of institutions and people in an inter-sectoral scenario. Governments are seeking to acquire food and fiber for the country taking into account their domestic production, export/import possibilities as well as the status of foreign exchange. While making policies in this context, it needs to consider the nutritional status of the population and the conditions of its weaker sections and minorities. Issues of sustainability and environment are also directly and indirectly linked with the decision making process. Thus the aim of any food and agricultural information system should not only be to monitor the current trends of growth of agricultural production within the process of overall economic development but also to provide support to decision makers in the formulation of policies relating to the sound use of agricultural inputs and technology of production which takes into consideration the social, economic and environmental issues and long-term sustainability. Such an information system would require data on five key exogenous factors : (a) climate, (b) soil, (c) water, (d) human resources, and (e) economic forces. Data on these factors are collected using different methods such as using conventional techniques (sample surveys/census or compilations from administrative records) or a mix of conventional and remote sensing. The information system may also sometime need special scientific data collected using high-tech scientific methodology and instruments. The statistician as a producer and manager of such an information system is faced with the challenge to provide a comprehensive, reliable and consistent picture through an integrated set of data. The paper presents three aspects of data integration in this context, namely (a) integration of economic data collected from different sources to derive secondary statistics (e.g. GDP, food availability), (b) integration of socio-economic data collected from different sources to analyse a given situation (e.g. model building), and (c) integration of physical and monetary data (e.g. satellite accounts) to study environmental issues, etc. Subsequently it deals with the issues relating to the collection and compilation of the data.
Discussant: RICH ALLEN, U.S. National Agricultural Statistics ServiceFloor Discussion
44. NEW DEVELOPMENTS IN IMPUTATION OF BUSINESS SURVEY DATAInvited Paper Session
Organizer: LEYLA MOHADJER, Westat
Chair: LEYLA MOHADJER, Westat
Variance Estimation in the Presence of Imputation for Missing Data. J.N.K. RAO, Carleton University
Unit nonresponse and item nonresponse both occur frequently in surveys. Unit nonresponse is customarily handled by weighting adjustment, whereas item nonresponse is usually treated by some form of imputation. In particular, deterministic or stochastic imputation is often used to assign values for missing item values. Methods for deterministic imputation include mean, ratio and nearest neighbor imputation. Stochastic imputation methods include marginal hot deck, stochastic regression and common donor. Frequentist inference from imputed data is based on repeated sampling framework and assumed response mechanism. On the other hand, use of imputation models requires only that the assumed model holds for the respondents. Treating the imputed values as true values and computing variance estimates using standard formulae applicable to complete samples can lead to serious underestimation of true standard errors, especially when the item nonresponse rate is appreciable. In this talk, I will review some recent work on variance estimation that takes proper account of the additional variability due to the unknown missing values. In particular, work on jackknife and other resampling methods will be reviewed as well as jackknife linearization. Both marginal parameters (totals, means, medians, etc.) as well as parameters measuring relationships (subclass means, correlation and regression coefficients, etc.) will be studied. Single imputation methods will be highlighted but comparisons with multiple imputation will also be presented.
Accounting for Imputation Error Variance for Data with Imputed Values. THOMAS KRENSKE, LEYLA MOHADJER and JILL MONTAQUILA, Westat
Treating imputed values as if they had actually been observed or reported may lead to substantial understatement of the variance of the estimate. Several methods have been developed to account for the effects of imputation error in variance estimation. This paper documents empirical results from using the All-Cases Imputation (ACI) method (Montaquila and Jernigan (1997)), the Rao-Shao (1992) variance estimation technique, multiple imputation methods, and others. The methods are also compared using various imputation procedures. The study is based on data from the Alcohol and Drug Services Study (ADSS), conducted for the Substance Abuse and Mental Health Services Administration (SAMHSA).This paper provides methodology regarding issues that have a limited literature base, 1) accounting for imputation error variance when imputed values from previous steps are used to impute for missing values for other variables; 2) using mixed imputation methods (e.g., using both regression and hot-deck) to impute for a single item; 3) accounting for imputation error variance when the applicability of or response to a question depends upon the responses to earlier questions; and 4) providing a simple way for data users to account for imputation error variance in their analyses.
Alternative Imputation Models for Wage Related Data Collected From Establishment Surveys. C. BARSKY, J. BUSZUWSKI, L. ERNST, M. LETTAU, M. LOEWENSTEIN, B. PIERCE, C. PONIKOWSKI, J. SMITH and S. WEST
In this paper alternative regression models are compared for item nonresponse of wage related data, which are collected from establishments by detailed occupation level. Since the surveys involved are of a longitudinal nature, two separate cases are considered. The first case involves item nonresponse for the first time an establishment is in the survey, and the second case is item nonresponse during an update time for the establishment.
The studies were undertaken to determine imputation methods for data collected from the National Compensation Survey (NCS), conducted by the Bureau of Labor Statistics (BLS).
In addition to wage data there are item nonresponse of benefit data. Various imputation methods for benefits are considered including multivariate imputation. In some situations the level of the occupation may be missing; this is also considered.
The empirical study involved testing various regression models on real survey data. The nonresponse patterns in our tests were simulated using observed patterns on current NCS data. An earlier study on wages done on the BLS Universe Data Base of establishments showed that a regression-based method performed best, thus other imputation methods, such as regression plus noise, hot deck, and multiple imputation were not re-studied.
Discussant: DAVID JUDKINS, WestatFloor Discussion
45. PRINCIPLES AND PRACTICES IN THE MEASUREMENT OF THE UNRECORDED ECONOMYInvited Paper Session
Organizer: MICHAEL COLLEDGE, Organization for Economic Co-operation (OECD)
Chair: ERNIE BOYKO, Statistics Canada
The unrecorded (or non-observed) economy refers to those economic transactions that for one reason or another do not get recorded in the official statistics collected for a country. It incorporates all the notions of the hidden economy, shadow economy, underground economy, black market, illegal economy, etc. It includes both deliberate and unwitting attempts on the part of those involved in the economic activity to avoid being recorded. As a proportion of the total economy, it can be quite large, and, with the continuing trend towards deregulation, it has the potential to increase. For example, in some countries in economic transition it has been quoted as high as 50%. There is little doubt that it constitutes the single largest source of non-sampling error for many economic surveys. But, given the nature of the problem, how have measurements of its size been derived and are they credible? This session aims to provide some answers.Speakers from western and transition countries will address the basic measurement problems and propose an integrated framework within which they can be handled. They will describe practical implementation of comprehensive programs to estimate unrecorded transactions and to incorporate them within the national accounts.
A Systematic Approach to Hidden and Informal Activities. RONALD LUTTIKHUIZEN and BRUGT KAZEMIER, Statistics Netherlands
The paper proposes a method to deal with the non-observed economy in compiling national accounts. The method consists of two steps. The first step in the method is the implementation of the commodity-flow approach. This approach covers the process of compiling supply and use tables from observations of individual units. Special attention is paid to the inclusion of the non-observed economy. The paper shows how this can be achieved in a systematic way.
In many countries, however, this approach does not yield sufficient results for lack of reliable business statistics. Therefore, a second step is added. This second step consists of the construction of a social accounting matrix. A social accounting matrix allows the inclusion of household statistics in the national accounts compilation process. Again, special attention is paid to the problems that may arise when there is a substantial non-observed economy.
The Exhaustiveness of Production Estimates: New Concepts and Methodologies. MANLIO CALZARONI, Italian Statistical Institute (ISTAT)
By productive units we refer nowadays to a very large number of different typologies. If our objective is to obtain a complete picture of the volume and characteristics of production activities, i.e. an exhaustiveness estimate of them, we need to take into account the problems of measuring the activities of all types of units. Furthermore, if our population target is production as defined in the system of national accounts, we also need to consider the problems of measuring the production of units that, in order to avoid payment of taxes or social security contributions etc., may be invisible to our statistical instruments.In this paper we outline the conceptual framework used to ensure the exhaustiveness of the production estimates in accordance to the SNA93. We describe the different types of production units that we have to investigate and the statistical problems that we have to deal with.
As regards the application of this framework, we present the experience of the Italian National Statistical Institute (Istat). We describe the methods used to ensure the exhaustiveness of the GDP estimates and we analyse the different types of productive units present in the Italian case as well as the approaches to include their production in the Italian official figures.
Estimation of Unobserved Economy in the Statistical Practices of Russia. IRINA MASAKOVA and TATANIA FEDOROVSKAYA, State Statistical Committee of the Russian Federation (Goskomstat)
In accordance with international standards, the Russian Federation Statistical Office aims to include all economic production activities in the national accounts. This paper focuses on the coverage of those activities that are likely to be missed by the regular survey programme - the so called non-observable economy. It describes the procedures that have been in place for several years to produce comprehensive national accounts estimates. It also summarises some of the known weaknesses in these procedures and outlines current and future work designed to address these problems.By its very nature, the non-observable economy is difficult to measure. Thus the Russian Federation Statistical Office uses a variety of different methods that complement one another and that are integrated within the national accounts framework. Specific methods are used within each economic sector, reflecting the particular characteristics of that sector. The paper outlines the treatment in the two sectors where the proportion of non-observed activities tends to be highest, namely trade and agricultural.
With the objective of further improving coverage, the Russian Federation Statistical Office has been conducting a series of studies of additional and alternative methods. The paper reports on the results obtained from surveys of employees of small enterprises and of hidden compensation and describes how these methods will be incorporated within the overall measurement framework.
Discussant: MICHAEL COLLEDGE, Organization for Economic Co-operation (OECD), will comment on the measurement framework, its implementation and its limitations. He will identify the similarities and differences in the approaches adopted within the western and transition economies, and will outline the likely scope of, and priorities for future work.Floor Discussion
49. STATISTICAL DISCLOSURE CONTROL FOR ESTABLISHMENT DATAInvited Paper Session
Organizer: PETER-PAUL DE WOLF, Statistics Netherlands
Chair: PETER KOOIMAN, Statistics Netherlands
Using Noise for Disclosure Limitation of Establishment Tabular Data. LAURA ZAYATZ, TIMOTHY EVANS and JOHN SLANTA, U.S. Bureau of the Census
KEYWORDS: Disclosure Limitation, Noise, Confidentiality, Cell Suppression, Magnitude Data
We propose a new disclosure limitation method for establishment magnitude tabular data in which noise is added to the underlying microdata prior to tabulation. The proposed method has several advantages compared to the standard method of cell suppression: it enables some information to be provided within more cells of the table, it eliminates the need to coordinate cell suppression patterns between tables, and it is a much less complicated and time-consuming procedure than cell suppression. In this paper we outline the proposed procedure for adding noise to the underlying establishment microdata, discuss the advantages and disadvantages of adding noise as compared to cell suppression, and describe the results of using noise with data from one survey.
Model Based Disclosure Limitation for Business Microdata. LUISA FRANCONI, Italian Statistical Institute (ISTAT) and JULIAN STANDER, University of Plymouth
There are three main reasons why business microdata present particular challenges for disclosure limitation. First, the inherent structure of the population of enterprises is typically sparse, but with strong concentrations in particular areas. Secondly, the distributions of associated random variables are often very skewed. Thirdly, business survey designs include the largest and most identifiable enterprises with probability one; see Cox (1995).Disclosure limitation for this type of data has up to now been achieved by masking and microaggregation procedures (Duncan and Pearson, 1991, Cox, 1994 and Defays and Nanopoulos, 1992), by data swapping (Fienberg et al., 1996), or by simulation from relevant distributions (McGuckin and Nguyen, 1988 and Fienberg, 1994).
We show how to build a framework for statistical disclosure control for business microdata that maintains an individual profile for each unit, but that goes a long way to preserving confidentiality. Our framework is based on smoothing procedures applied to groups of enterprises that are defined on the basis of auxiliary information (for example, NACE classifications or enterprise size) or through cluster analysis or neighbourhood graphs.
We also outline methodology based on hierarchical models. In this approach, individual enterprises can correspond to the first level of the hierarchy and groups of enterprises, chosen as above or by using geographical information, can correspond to the second level. An attractive feature is the fact that, instead of releasing the microdata themselves, the estimates of some model parameters such as the underlying means for individual enterprises or of group effects could be published.
ARGUS for Statistical Disclosure Control. ANCO HUNDEPOOL, Statistics Netherlands
During the SDC-project (funded by the 4th Framework program of the EU) the ARGUS software has been developed. This software aims at the disclosure protection of both microdata as well as tabular data. Using global recoding and local suppression this software will enable you to produce safe data. A new start will be made in the CASC-project. We plan to extend the ARGUS software to handle more complex (hierarchical and linked) tables. This requires the development of complex optimisation techniques. Besides these computationally complex techniques some quicker approximations will be implemented.
For microdata the current version of ARGUS is quite suitable for data on persons. New techniques for the disclosure limitation of business microdata will be investigated and implemented. Among these new techniques being investigated in the CASC-project are micro-aggregation, noise addition, masking and Post Randomisation (PRAM).
Discussant: LARRY COX, U.S. Environmental Protection AgencyFloor Discussion
50. NEW METHODS OF DATA COLLECTION FOR ESTABLISHMENT SURVEYSInvited Paper Session
Organizer: WILLIAM NICHOLLS II, U.S. Bureau of the Census (retired)
Chair: THOMAS MESENBOURG, JR., U.S. Bureau of the Census
Use of New Data Collection Methods in Establishment Surveys. WILLIAM NICHOLLS II (retired), THOMAS MESENBOURG and EDITH DE LEEUW, MethodikA Amsterdam; STEPHEN ANDREWS, U.S. Bureau of Economic Analysis
This paper describes the data collection methods of establishment surveys based on a 1999 canvass of 13 leading statistical agencies. Mail questionnaires remain the most prevalent method of establishment survey data collection, but compared with reports from a similar canvass in 1993, they are now more likely to be supplemented with the use of other collection methods. The 1990s also have witnessed the growth of newly emerging collection methods for establishment surveys, including scanning and optical character recognition, electronic data interchange, and computerized self administered questionnaires. Establishment survey collection methods vary somewhat with survey sample size, frequency of administration, survey content, and among survey organizations.
The use of CAI Data Collection in Business Surveys at Statistics Canada. ROLLY JAMIESON and GUY PARENT, Statistics Canada
Statistics Canada outlined it aspirations and plans for the use of CATI, CAPI, and other CAI technologies in Business Surveys at the 1992 Census Bureau Annual Research Conference. This paper will describe the CAI methods that we have implemented over the intervening years, including CATI, CAPI, electronic reporting, Internet data collection using the Web, and an experiment with touch-tone data entry. The paper will review the different approaches to CATI development required for annual vs. monthly surveys and consider the impact of increased on-line-editing on the quality of the collected data. The paper also will address: a CAI approach to frame updates; the use of on-line coding tools; pre-contact methods; tailored questionnaires; and issues of multi-contact establishments. Key points will be illustrated with results from the Workplace and Employee Survey, which has both an establishment questionnaire and a separate employee questionnaire and utilizes multi-contact mixed mode collection with mail, CAPI, and CATI. The paper will conclude by suggesting some of the organizational issues and impacts deriving from increased use of CAI.
Progress and Projections in Computer Assisted Data Collection at the Bureau of Labor Statistics. RICHARD CLAYTON, RICHARD ROSEN, WILLIAM MCCARTHY and JIM KENNEDY, U.S. Bureau of Labor Statistics
The CASIC revolution has had a profound affect on several of the Bureau of Labor Statistics (BLS) major programs. Several programs rely exclusively on a single CASIC methods, while others used a mixed mode environment sometimes employing several methods designed to match the respondents technological environment and the cost and quality features of various CASIC methods.This paper will review the range of research and development into CASIC methods conducted by the BLS over the past 15 years. It will cover the range of methods currently implemented in several programs including Current Employment Statistics, the Covered Employment and Wages (ES-202) program, the Current Population Survey, the Consumer Price Index, Employment cost Index and the new Job Openings and Labor Turnover Survey (JOLTS) and others.
Plans for expanded use of CASIC are underway in many of these programs to be implemented over the next few years. These plans will be profiled and described. The paper will also address resources reallocations affected by the large-scale implementation of CASIC methods and their organizational effects.
Statistical Processing in the New Millennium: The Impact of Information Technology on Data Collection, Processing and Dissemination. WOUTER KELLER, AD WILLEBOONRDSE and WINFRIED YPMA, Statistics Netherlands
This paper states how Statistics Netherlands is preparing for a new era in the collection and dissemination of statistics, as it is triggered by technological and methodological developments, which extend beyond the replacement of paper by electronic collection. While these developments apply equally to statistics on persons and households, their greatest challenge at present is in applications to statistics on businesses. An essential feature of the turn to the new era is the farewell to the stovepipe way of data processing. The paper discusses how new technological and methodological tools will affect both processes and their organization. Special emphasis is placed on one of the major chances and challenges the new tools present: establishing coherence in the content of statistics and in presentation to users.
Floor Discussion
51. COVERAGE IN SCHOOL SAMPLING FRAMESInvited Paper Session
Organizer: STEVEN KAUFMAN, U.S. National Center for Education Statistics
Chair: MARILYN MCMILLEN, U.S. National Center for Education Statistics
One of the first steps in selecting a probability sample is developing a sampling frame. The frame is one of the first places where error can be introduced into the final estimates. If the PSU measure of size, stratum or sorting identifier information are in error then sampling error can be increased. If PSUs are missing or created between the frame reference period and the collection reference period then biases may be introduced. For these reasons, any sampling frame should periodically have its coverage evaluated. This session presents three such evaluations. The first presentation describes an evaluation based on the first time development of a school frame. The second and third presentations describe evaluations of more established frames. The second presentation evaluates school coverage by comparing alternative frames with the original frame. The third presentation evaluates changes in the school frame from the frame reference period to the sample collection reference period. In this case, a two year time lag.
Implementing a Sampling Frame of Elementary and Secondary Schools in Canada. MARIANNE GOSSEN and SIMON CHEUNG, Statistics Canada
The Canadian Youth in Transition Survey (YITS) is a longitudinal survey which collects data from a sample of youths biennially so as to study the factors which may influence their transitions between school and work. The YITS sample contains a cohort of 15-year-old students. A design requirement of YITS is the two-stage sampling of students, with schools as the Primary Sampling Units. The survey frame is constructed based on data provided by provincial education ministries as they have the primary jurisdiction over elementary and secondary education. These administrative data were evaluated as they became available to Statistics Canada. Evaluation results highlighted several issues for YITS. For certain provinces, data are not available until near the end of the school year, thus restricting YITS to use the data of the previous school year. For some provinces and regions where the school system undergoes mission changes (which usually involve school closures and amalgamations, the application of a one-year-old frame) could adversely affect our ability to predict student enrollment and education programs. This paper describes these frame issues, provides quantitative measures of their impacts and discusses strategies to address them.
Evaluating the Coverage of the U.S. National Center for Education Statistics' Public Elementary / Secondary School Frame. THOMAS HAMANN, U.S. Bureau of the Census
This presentation discusses the public school frame used by the National Center for Education Statistics (NCES) Common Core of Data (CCD). This coverage evaluation compares the CCD frame with commercially available database files of schools produced from two private U.S. firms, Market Data Retrieval (MDR) and Quality Education Data (QED). The primary objective of this evaluation is to determine the accuracy and completeness of the list of schools included in the 1994-95 CCD Public Elementary/Secondary School Universe survey.
This evaluation examines data files by school and school district type for each state. The CCD frame is compared to each outside source separately. School matches are made on identification number, school name or address between the CCD and QED files and between the CCD and MDR files.
The results of this evaluation will provide a starting point for compiling a complete and more clearly identified CCD list and for reconciling the CCD file to other sources. The results will further provide a comprehensive list of the schools that appear on the CCD file and not on the QED or MDR files, and vice-versa, for each state and perhaps revealing data collection and classification differences by state. The greater coverage the better CCD will be able to provide sampling frames for other surveys.
An Evaluation of PSS Data Quality in Relation to 1998 NAEP Survey Experience. JOHN BURKE, Westat
The National Center for Education Statistics (NCES) maintains databases of public and private elementary and secondary schools in the United States. These are called the Common Core of Data (CCD) and the Private School Survey (PSS) respectively. These databases are used as sampling frames for NCES surveys of schools and students.
There is always a time lag between the reference period of a frame and the actual data collection. During that time, critical school variables, such as in-scope status, enrollment and addresses, can change. These changes create potential issues of coverage for these sampling frames. This presentation will report on a coverage evaluation exercise that used data from a commercially available database of schools, Quality Education Data (QED), and information from a fielded survey, National Assessment of Educational Progress (NAEP). The study will evaluate the impact of changes in critical CCD and PSS variables. Results will be presented, but the emphasis will be on the issues of changes in school configuration and names over time, and differences between administrative units and physical school buildings, and their effects on coverage. These issues are relevant in other sectors and in other countries.
Discussant: FRITZ SCHEUREN, The Urban InstituteFloor Discussion
55. LINKING LONGITUDINAL BUSINESS AND HOUSEHOLD DATAInvited Paper Session
Organizer: JOHN ABOWD and JULIA LANE, Cornell University
Chair: SILVIA BIFFIGNANDI, University of Bergamo, Italy
Theme: The US Census Bureau Standard Statistical Establishment List (SSEL) is the universe from which the Bureau derives its samples of establishments and business organizations. The Center for Economic Studies at the Census Bureau has created a longitudinal version of this list with links to data from the Economic Censuses. The three papers in this session use a variety of sources of household data and a variety of linking techniques to create data sets that match the individuals in the household to their employers, using the SSEL. Two papers use statistical linking techniques based on information provided by the household respondents (2,3). The other paper uses exact matching techniques based on identification numbers, supplemented by statistical techniques (1). Two of the papers create large scale analysis data sets that provide usable samples of employees within the businesses that have been linked (1,3) while two create household samples with business level information available to the analysis (1,2). All of the resulting data files are longitudinal on the employer (business) dimension. One of the papers is longitudinal in both dimensions (1). The authors are among the most experienced creators and users of linked employer-employee data in the United States. The discussant is one of the leading European designers and users of such data.Linked employer-household data provide a rich source of information about both sides of the labor market without increasing either the number of surveys or the respondent burden associated with any given survey. Because of the large potential value added from the study of linked data, particularly linked data that have a longitudinal dimension, it is important to study the characteristics of the design of these data sets. The papers in this session lay out explicit design goals and provide quantitative evidence about the ability of different types of data linking procedures to meet these goals. The papers also give examples of analyses performed on these linked data.
New Developments in Integrating Household and Firm Datasets. LARS VILHUBER, JOHN ABOWD and JULIA LANE, U.S. Bureau of the Census
The LEHD project at the US Census Bureau is a set of related prototypes each of which links an existing household-level survey (CPS, SIPP/SPD, ACS) with longitudinal information on the employing businesses using the information in the Standard Statistical Establishment List. This paper presents an overview of the three prototypes and compares the properties of the linked employer-household data for each prototype. Issues of representativeness of the linked sample (from both the business and household side) will be considered. Examples of the use of the linked data to study employer-level human capital and workforce composition decisions.
Collecting Linked Data Using Household Survey Responses to Identify Employers: Lessons from the MEPS-IC. KRISTIN MCCUE, U.S. Bureau of the Census
One approach to collecting data that links information on businesses and their employees is to start with a household sample. That is, use an area probability sample to select persons and then base selection of a sample of businesses on locating information collected from employed household members about their employers. Part of the sample for the Medical Expenditure Panel Survey-Insurance Component (MEPS-IC) follows this design. This paper will describe the linked part of this design, and consider its advantages and disadvantages in terms of the potential usefulness of linked samples of this form for economic analysis. In gauging these advantages and disadvantages, the paper will consider two alternative perspectives: economics research that focus on employees' decisions about jobs and health care versus research questions that focus on businesses' decisions about who to hire and what sort of health care to offer.
The New Worker-Employer Characteristics Database. KIMBERLY BAYARD, JUDITH HELLERSTEIN, DAVID NEUMARK, University of Maryland and KENNETH TROSKE, University of Missouri-Columbia
The New Worker-Employer Characteristic Database (NWECD) is a matched employer-employee dataset that includes longitudinal information for both manufacturing and non-manufacturing establishments. Individual and household data come from the 1990 decennial US Census of Population sample detail file. The information on establishments was comes from the Bureau of the Census Standard Statistical Establishment List (SSEL). The main difference between the NWECD and the original WECD is the breadth of establishment data available. The SSEL has only limited employer information (employment, payroll, sales and industry), whereas the WECD, which permits access to all of the Longitudinal Research Database variables, but for manufacturing only, contains detailed longitudinal information on establishments that appear in the LRD. This paper considers the analytical advantages and shortcomings of the NWECD in comparison with the WECD.
Discussant: KEVIN McKINNEY, U.S. Census BureauFloor Discussion
56. PLANS OF GOVERNMENT AGENCIES FOR RESEARCH IN ESTABLISHMENT SURVEYS (PANEL SESSION)Invited Paper Session
Organizer: PATRICK J. CANTWELL, U.S.Bureau of the Census
Chair: CYNTHIA CLARK, U.S. Bureau of the Census
This session is a panel presentation and discussion. Each of the five panelists representing government agencies from various countries will speak for ten minutes. The discussant will then also speak for ten minutes. This will be followed by a discussion among the panel--reactions, agreement, disagreements--for about 30 minutes. Finally questions from the audience, sparking further discussion among the panel, will complete the program.
Panelists:
EVA ELVERS, Statistics Sweden
ALAN DORFMAN, U.S. Bureau of Labor Statistics
DON ROYCE, Statistics Canada
CHARLES PAUTLER, U.S. Bureau of the Census
GEOFF LEE, Australian Bureau of Statistics
What will be the important issues for establishment surveys as we begin the new millennium? A panel of experts from various eminent government statistical agencies will present and discuss their views. They will be joined by an academic who has extensive experience working with government agencies. According to the initial thoughts of the panel, some of the most important general areas will be burden, budget, confidentiality, and dissemination. These four--separately and in combination--will drive the future.The first two concerns--burden and budget--will affect the way we design efficient samples and collect the data. Serious investigation is necessary in the area of coordinating samples from one occasion to another and across surveys that canvass the same companies. Data collection will evolve away from the traditional methods of mailout-mailback and telephone contact. More research will be done in the fields of electronic reporting, Internet reporting, optical character recognition, and administrative records. The use of administrative records offers some of the greatest savings in terms of budget and response burden. However, before they can be used effectively, we still have to resolve many questions involving data concepts and definitions, access, and coverage.
As we move (in some cases) toward smaller samples while the demand for detailed data grows, small-area estimation--an important topic in household surveys for quite some time--will become critical in establishment surveys as well. Access to the data and its dissemination could become an issue of contention. While legislation may allow noncentralized agencies to share the data, disclosure techniques must continue to grow in sophistication to uphold the privacy and confidentiality concerns of the data providers and collectors. Meanwhile, the endless possibilities for providing data to our customers through various media have only begun to be investigated.
Other topics of importance in the near future will be some of the same ones that have drawn the interest of researchers in the past decade: automating common processes, classification issues, record linkage, and questionnaire design.
Discussant: RAY CHAMBERS, University of SouthamptonFloor discussion
57. A METADATA PRIMER FOR SURVEY STATISTICIANSInvited Paper Session
Organizer: CATHRYN DIPPO, U.S. Bureau of Labor Statistics and DANIEL GILLMAN, U.S. Bureau of the Census
Chair: EASLEY HOY, U.S. Bureau of the Census
Historically, survey practitioners have recognized the need to document their methods and practicesboth for users of their data and for their fellow practitioners in need of methodological guidance. While the establishment and phenomenal growth in the Survey Research Methods Section of the American Statistical Association over the last 20 years testifies to the latter need, the first need has generally received little attention. In general, a brief technical note or appendix is considered sufficient for a paper publication, and a data dictionary is added for public use datasets. Exceptions like Technical Paper 40: The Current Population Survey Design and Methodology are rare because of their time consuming nature and the widely-held belief that this level of documentation is not needed by internal staff or desired by external users.The advent of the Internet and the World Wide Web as a primary means of information dissemination is causing survey organizations to take a fresh look at their methods for documentation. The state-of-the-art policies and procedures for addressing these needs are about to undergo significant changes. One indicator of the changing view towards documentation is the use of the term metadata. Simply, metadata is data about data, but in the context of statistical surveys, statistical metadata has a much more precise meaning. The full implications of this definition for survey-based information are just beginning to be understood. This session will address these issues.
In this session, we will explore several facets of statistical metadata, including its role in improving survey operations and the use of statistical information. Various research efforts at establishing international standards and the real-world problems of developing and implementing metadata repositories will also be discussed. We propose that the session will consist of four talks addressing the four major items listed below. Each talk will discuss the indicated sub-terms and the interrelationships with the other major items.
The Role of Metadata in Statistics. CATHRYN DIPPO, U.S. Bureau of Labor Statistics and BO SUNDGREN, Statistics Sweden
This paper describes what is meant by metadata for statistical surveys; how it can help in using, sharing, and understanding statistical data; how it can help the statistical agency, survey methodologists, and researchers design, process, analyze, and disseminate statistical surveys and data. Many organizations inside and outside the statistical arena are building metadata systems using many approaches. Generalized approaches are being tried in the U.S., Sweden, and some other places. The advantages of these are discussed and the impact on the statistical agency or organization is described in detail.
Metadata Standards and Metadata Registries: An Overview. DANIEL GILLMAN, U.S. Bureau of the Census and BRUCE BARGMEYER, U.S. Environmental Protection Agency
Much work is being accomplished in the national and international standards communities to reach consensus on standardizing metadata and registries for organizing that metadata. This work has had a large impact on efforts to build metadata registries in the U.S. statistical community. This paper will describe several metadata standards and discuss their importance to statistical agencies. The use of standards to enhance or maintain interoperability between systems and people is discussed. Metadata registries and their importance for managing metadata, especially the quality of the metadata, are described. Finally, a description of a sample of metadata registry systems around the world is given. Emphasis is on the impact a metadata registry can have in a statistical agency.
Use of Metadata for the Effective Integration of Data from Multiple Sources. MARK WALLACE, SAM HIGHSMITH, CAVAN CAPPS, U.S. Bureau of the Census and CATHRYN DIPPO, U.S. Bureau of Labor Statistics
The Integrated Information Solutions Program at the Census Bureau, in conjunction with FEDSTATS, has been collaborating with several Federal agencies to develop an Integrated Data Pilot. The purpose of this pilot is to demonstrate the value of dynamically integrated data from multiple sources to data users and to demonstrate the effectiveness of various technical concepts for producing this capability. This paper would explore the metadata support structure required to enable the integration of data and metadata from multiple government agencies.
Collection and Classification of Metadata: The Real World of Implementation. ERNIE BOYKO, Statistics Canada and MICHAEL COLLEDGE, Organization for Economic Co-operation (OECD)
The collection of metadata and classification of objects described by metadata (including the assigned classification terms) are major problems confronting the successful implementation of metadata repositories and metadata driven statistical information systems. Collection of metadata includes the basic create, replace, update, and delete functions for any database. Most survey management organizations create metadata in their own ways, and the work necessary to put this information in a database after it is finalized is overwhelming. This paper discusses the need to create metadata automatically through the use of automated survey design and processing tools. Classification is the process of assigning terms to an object for search, retrieval, and semantic analysis. The paper also discusses why we need to classify, how to organize terms in a classification scheme, large versus small schemes, and how classification should work in practice.
Floor Discussion