A Framework for Planning U.S. Federal Statistics for the 1980's

Chapter 26. LONGITUDINAL SURVEYS

Background

Analysts of statistical data have always known that data collected at and about a single point in time do not have the analytical power to determine causality or even temporal correlation. The analysis of sequential cross-sectional survey data provides trend information; however, this method of analysis can only yield inferences about the correlates of change, not direct measures. As a result, longitudinal techniques have been developed which associate data about the same individual respondent obtained at different points in time. Researchers have frequently developed retrospective longitudinal data by ascertaining past events in surveys through respondent recall, through the use of administrative records or by means of a combination of both techniques. Most of these attempts have been flawed because it is difficult, if not impossible, for a respondent to recall accurately his status, actions or attitudes at several fixed points in time in the past. It is also a rare investigator who finds that past administrative records meet his particular needs.

In response to these problems there has been an increasing emphasis on the development of longitudinal surveys which use a prospective framework. In this way, the investigator has some control over recall, by conducting interviews at appropriate intervals, and, to the extent feasible, over administrative records by influencing their content.

This paper is designed primarily to highlight some of the serious problems inherent in longitudinal surveys and to suggest some alternatives. It should not be considered as a definitive, exhaustive review of the subject. Rather, it should be emphasized that much attention should be devoted to this topic in the development of statistical programs for the 1980's.

The construction of a prospective longitudinal file begins in the present and extends into the future rather than into the past. Thus, prospective longitudinal analysis cannot occur until after a significant passage of time. Many Federal agencies have accepted the impediment of the extensive time. span between collection and analysis and have implemented a number of prospective longitudinal

surveys. One of the pioneering activities in the United States was the Framingham study of cardiovascular diseases. This study, begun in 1948 by the Public Health Service, selected a cohort of over 5,000 persons and has attempted to follow them and give them physical examinations at 2-year intervals. The purpose of the study was to attempt to isolate the correlates of hypertensive and arteriosclerotic cardiovascular disease.

An early example of socioeconomic interview surveys was the set of longitudinal studies of the labor force behavior of various age-sex cohorts, funded by the Manpower Administration (now the Employment and Training Administration) and collected by the Census Bureau for analysis by the Ohio State University under the direction of Herbert Parnes. The Parnes Studies, as they are popularly called, began in 1966 and continue to the present. A new cohort has been identified and interviewing for this new series will begin in 1979.

Another early entrant was the Longitudinal Retirement History Survey sponsored by the Social Security Administration (SSA), and again, collected by the Bureau of the Census. Like the Parnes Studies, this was also an interview survey of a general sample of the population. Far earlier SSA established the Continuous Work History Survey which utilizes a sample of the Social Security files and enters job changes into the file as they occur. Work histories have been available from this source since 1951.

A more ambitious effort to develop longitudinal data is one that is being developed in the criminal justice area. A cooperative system involving the Law Enforcement Assistance Administration (LEAA), the Federal Bureau of Investigation (FBI), and State governments is attempting to build longitudinal data bases which would permit the critical examination of the justice process in the various States. This program, called Offender Based Transaction Statistics (OBTS), would track offenders and suspected offenders through the criminal justice process. The file would also include a unique identifier of the individual so that subsequent events could be recorded and tracked, thus providing a lifetime longitudinal record.

A number of longitudinal survey efforts have also been started in education. The largest single effort has been the Longitudinal Study of the High School Class of 1972, conducted by the National Center for Education Statistics (NCES). Another activity is the Panel Study of Family Income Dynamics conducted for DHEW by the Institute for Social Research in Ann Arbor. Numerous other longitudinal surveys have also been initiated during the past several years.

Issues

By their very nature, the data bases from longitudinal surveys inexorably increase. This growth in the data base provides a rich analytical resource which permits joint consideration of disparate variables. However, the cost and other problems associated with longitudinal surveys require a fresh look at many of the conflicting issues involved.

The problems include privacy, analytical complexity, burden on respondents, cost, and so forth. The problems, of course, are of a different nature depending on whether the data are derived from sample surveys or administrative records.

Respondent Burden

The problem of burden on respondents is receiving increased attention as a result of the concerted effort by the Executive Office of the President to reduce the burden on the public of Federal information requests.

If administrative record sources are limited to the administrative needs for which the data were originally gathered, there should be no problem of excessive burden, but the inclusion of additional questions for statistical purposes can impact on burden by increasing the number of questions over the minimum which may be required for program administration. The 1976 revisions to OMB Circular No. A-40, in fact, severely restrict the addition of questions on application forms which do not directly impact on the granting of the benefit for which the applications are filed. Yet applications can often provide the basic data upon which a longitudinal record is based. Freguently survey information can be linked directly to those administrative records. In some cases there are clear advantages to the addition of a few questions for statistical purposes in terms of overall burden since the administrative records can be sampled on the basis of the additional questions and the additional information collected only for a sample of cases based on the new data.

This linking procedure is no panacea. Carrying out such linkage projects presents substantial technical

problems. Such linkages may also present legal problems. The Privacy Act of 1974 places limitations on this process. The designer of a longitudinal data base which proposes to use administrative records must examine the Privacy Act implications early in his design activities.

The burden problems for sample longitudinal surveys are somewhat different. While the burden on the respondent in terms of total hours spent on an individual inquiry is not greater than an ad hoc survey, the repetitive nature of the survey significantly increases the total burden on the individuals, families or households selected for inclusion in a longitudinal survey. Of course, the nonlongitudinal panel survey in which the sample address is contacted repeatedly has similar problems. To the extent that such surveys are voluntary, the concern for burden should be reduced. Conversely, the concern for continued response should be intensified, since the accumulation of nonresponses can spell disaster to a longitudinal survey effort.

A common practice for obtaining longitudinal data is to use records from a survey which has already been conducted and to follow-up the same respondents at a later date to determine how their situation may have changed. This is generally a less effective method of carrying out a longitudinal survey since it may be difficult to make the subsequent contact with respondents if no mechanism had been developed initially to increase the likelihood of locating the respondent subsequently.

Informed Consent and Privacy

The discussion of burden and ways of reducing burden by using existing sources of data inevitably leads to the questions of informed consent and the right of privacy. Over the years there has been a change in the attitudes of the public and the Government as to the right of a respondent to know the uses of the data which he supplies. As noted above, with both major cross-sectional surveys and longitudinal surveys there is a temptation to have follow-up surveys with data requirements which are different from the original request. This may violate the general principle of informed consent, depending upon what the respondent was told initially. If there is a possibility of a follow-up, it is important for the researcher to clearly inform the respondent at the initial interview that he may be recontacted for a broadly stated set of purposes.

Informed consent when the inquiry utilizes administrative records presents a more difficult issue. The Privacy Act imposes some limitations on the use of administrative records for statistical purposes,

particularly when linking administrative records to build longitudinal records is required. These limitations affect not only the development of longitudinal files but may also impact on other kinds of administrative uses, audits, and so forth. Recommendations concerning the privacy issue will be found in the Framework chapter on Confindentiality.

Longitudinal surveys present very special problems with respect to the confidentiality of information. For example, the longitudinal use of administrative records inevitably leads to the development of ad hoc dossiers. Matching administrative records from several sources, or adding survey data to administrative records, generates augmented records. The use of these augmented administrative records for any but statistical purposes is a major concern. This problem is perhaps most serious with the offender records discussed above.

Administrative record files frequently are developed from records which are in the public domain (such as arrest or court records). But the mere fact of aggregating bits of public record information from various sources into a single record (dossier) frequently changes the very nature of the data. Congressional recognition of this transposition is reflected in Section 524b of the Omnibus Crime Control Act (Public Law 93-83 as amended), which affords special privacy treatment to criminal histories containing aggregations of public records about individuals.

The problem of the sensitivity of data files is accompanied by the increase in the identifiability of individual records within longitudinal files. The collection of additional data over periods of time makes it increasingly possible to identify the respondent since detailed patterns of status and behavior become available. The unique longitudinal data concerning changing status and activities may sometimes act as a surrogate for a unique identifier. The probability of someone privy to the data being able to intentionally piece together enough facts to identify a pre-selected individual is very low. The possibility of inadvertent disclosure, however, increases rapidly with each iteration of a longitudinal survey.

When all major data collection efforts are controlled by a single agency, the danger is diminished, assuming that adequate security procedures have been established within the agency and that the agency has adequate legislation requiring it to maintain confidentiality. The demand for, and provision of comprehensive microdata files to other researchers again increases the danger of accidental disclosure manifold if adequate disclosure prevention steps have not been taken. The larger the number of

researchers the greater the probability that some individual's identity will be divulged. In order to counteract this problem legislative methods must be found which will permit the dissemination of microrecords to other investigators while protecting the confidentiality of the record. Section 524a of 9383, as amended, extends confidentiality requirements for LEAA data to all grantees and contractors. Perhaps this legislation could be used as a model with some provision extending the coverage to purchasers of tapes.

Longitudinal surveys, to be effective, must maintain a great deal of current identification data about the subject of the inquiry in the basic data file. The existence of such identifying information also increases the risk of disclosure. The problem is frequently made more acute because the researcher asks for identifying information on parents, children, other relatives and friends. Although this information may be used only for tracking purposes, it presents an opportunity for identifying families and friends as well as the subject himself. Fortunately this problem relates only to the collection agency.

Design Considerations

Possibly the most important design consideration which particularly affects longitudinal surveys is the maintenance of a high response rate over extended periods of time. This generally entails obtaining some information about respondent's family and friends since they may know how to locate the respondent if he moves. Many designers of longitudinal surveys do not reserve adequate resources for locating respondents in second and subsequent rounds of the survey. Thus they frequently achieve inadequate response rates in subsequent inquiries. There are methods of maintaining adequate response rates over the life of a survey but these methods are expensive and require a significant effort. The response rate requirements in the President's 1976 and subsequent guidelines for the reduction of paperwork burden pertain equally to longitudinal surveys and crosssectional surveys. Those guidelines require at least a 75% response rate for routine approval or a 50% response rate if adequate justification can be made. The percentage must use the total original sample as the denominator and the expected response to the next iteration as the numerator. This does not appear to be an impossible requirement. The Census Bureaus interview for the Parnes Studies are still contacting 80% of the original respondents without making allowance for death, out migrations or other changes which would make interviewing impossible.

Should respondents to federally sponsored surveys receive some kind of remuneration? While this question has been raised in terms of all survey activity, it becomes more acute with longitudinal surveys. The need to maintain a high response rate frequently prompts the survey manager to decide to provide monetary or other incentives to increase response. This presents a philosophical problem: Should the Government have to pay citizens to provide information which will help in managing the Government and in designing social programs? There is also the practical question about the efficacy of such incentives. Although the philosophical issue can be argued from either side, if the practice became widespread the cost of data collection to the Federal Government could become excessive. If, however, it can be proved that payment significantly increases responses then perhaps the case for payment could be made. At present no persuasive evidence exists to show that Federally sponsored surveys are improved by providing monetary incentives. In general, therefore, the practice must be discouraged. However, if an investigator shows that the provision of cash or other incentives is cost effective, they could be permitted for specific projects.

Another problem which merits design consideration is the conditioning of the respondent. Conditioning may affect not only responses to questions but also actual behavior. Designers of longitudinal surveys must be prepared to assess the implications of this phenomenom and take appropriate steps. One such step includes in-depth reinterviewing of the sample population to determine whether the responses to the original interviewer represented truth. It may be necessary to administer identical batteries of questions to independent samples of the population, to ascertain what differences exist which could be attributed to conditioning. It may also be possible to examine external measures to determine congruence between the independent estimates and the survey estimates.

The Census Bureau, for example, has found that panel conditioning is a very real concern. The Current Population Survey produces very different estimates of labor force activity (and other measures) for households interviewed for the first time versus those interviewed more than once. (Published and unpublished reports on these phenomena are available from the Bureau of the Census.) The implications for longitudinal surveys are obvious.

Another problem with longitudinal surveys seems pedestrian until it is examined carefully. The computer has made possible the development of the extensive multivariate longitudinal survey. Without

the computer it would not be possible to even consider the joint analysis of the multitude of variables which are available in a longitudinal record. The great detail which provides the analytical potential also guarantees a difficult, complex record. Unless the record and the accompanying file are adequately documented, it is valueless or worse. Documentation failures have been responsible for substantial time consuming and expensive failures. Even small errors in documentation can account for major discrepancies in the final data.

The problem of documentation suggests another dilemma. Because of Federal procurement policies, agencies which choose to conduct their longitudinal surveys using a commercial firm, rather than collecting the data themselves or having another Federal agency do it under an interagency agreement, may find themselves with a new data collector at some point. With all good intentions it is difficult for a survey research agency to document the rationale for every decision made, both large and small. However, it is difficult to carry out a program without knowing the basis for past decisions. Changing contractors in the course of a longitudinal survey can cause serious disruptions both in data collection and data analysis.

There is a temptation for a sponsoring agency to add sets of questions to an ongoing longitudinal survey to provide answers to an urgent problem of the moment. While this would seem to be a valid use of an existing resource, there are potentially serious costs. This increase in the burden may have an impact on future cooperation, perhaps destroying the base study itself. Further, it will have a negative impact on the processing of the longitudinal data. A general rule would be to minimize the amount of data requested during any iteration of a longitudinal survey beyond that needed to further the original analytical plan.

Analysis

Another generic problem relates to difficulties in developing approaches to analyzing the data which is finally amassed. The basic files of existing longitudinal surveys have grown both in size and complexity. Some complaints have been voiced about existing studies; criticizing their failure to exploit the rich data base. The construction of complex longitudinal variables often introduces so many alternatives that, even with very large samples. serious reliability problems are introduced into the individual cells. There is also the problem of the introduction of noise when variables are expanded to encompass all of the relevant longitudinal data. This noise is in the form of response error, coding, or data entry errors. Even if the longitudinal computer

record is fairly simple, extensive staff time is needed. to digest the documentation for the files and to develop working files which can be used for convenient analysis.

There is a tendency for researchers developing questionnaires to try to include all of the data which could possibly have relevance in a given situation. This results in treating a longitudinal collection instrument as if it were a case study. This tendency may contribute to the excessive expansion of the survey instrument without a concomitant increase in the value of the data. The result is a burdensome survey with a complex file and all of the related problems of analysis.

The proliferation of longitudinal survey activities, especially those utilizing general population samples, and the rich data base potential combined with the inherent high cost, has resulted in general suggestions for the establishment of a national omnibus longitudinal survey. The main argument offered in support of this proposal is the sharing of the high cost of such an effort. The problems of burden, conditioning, panel decay, and others suggest that much more research needs to be accomplished before such an idea should be acted upon. However, it is axiomatic that longitudinal surveys represent such a large public investment that the broadest usage should be encouraged consistent with the previously voiced concerns.

During the next decade, serious work needs to be done to improve the techniques for analyzing longitudinal data.

Recommendations

1. Steps should be taken to modify the Federal procurement process to permit an agency to

continue to use the contractor which began a longitudinal project to complete it even though the project could take many years. In the past, longitudinal surveys have been treated like all other data collection efforts. An agency perceives a need, develops an instrument, establishes a collection mechanism and an intuitive analytical plan. Most agencies then initiate a competitive procurement to collect and process the required data. When competitive procurement is used for a longitudinal survey, the survey will generally be put out for bids at least once after the initial contract but probably more often during the period of the study. The introduction of a new contractor in the course of a continuing process is disruptive at best. The discontinuity of contractors causes damage to the data as well as significant additional expense and delay to the sponsoring agency.

2. Prior to the beginning of detailed design work on a longitudinal survey, preliminary clearance should be obtained from the Office of Federal Statistical Policy and Standards or the Office of Management and Budget if administration records are involved. The clearance request should specify the universe to be covered, the size of the sample, the nature of the basic inquiry and any other information then available. This information will then be published in the Statistical Reporter to familiarize agencies which may have similar interests with the proposed program. This would permit the more effective coordination and exploitation of longitudinal surveys early in the development process.

« Previous Continue »

Books