Page images
PDF
EPUB

individuals and families who participate as respondents in this study shall remain strictly confidential.""

The difference between these two cases is clear and fundamental: In the Census case, the data were protected by a statute2 from disclosure in individually identifiable form; in the New Jersey case they were not.3 This chapter examines some of the problems posed by legally unprotected statistical-reporting and research files that contain data about identifiable individuals. It focuses on the need to protect individual data subjects from injury through disclosure of data about them, on one hand, and, on the other, the need to make files of personal data more accessible to persons who can make constructive use of the data they contain.

Background Observations

When we began our examination of automated record-keeping operations, we expected that we could leave out entirely data

1 David N. Kershaw and Joseph C. Small, “Data Confidentiallity and Privacy: Lessons from the New Jersey Negative Income Tax Experiment," Public Policy, Vol. XX, No. 2 (Spring 1972), p. 261. The Mercer County dispute stemmed from a change in the State public assistance law which made more participants in the experiment eligible for welfare than had been the case when the experiment began. The 1969 investigation was terminated when the contractor agreed to reimburse the county welfare agency for any overpayments that came to light. Two years later, however, the experiment was subjected to a four-month grand jury investigation of charges that the contractor had "instructed lowincome families taking part in the experiment not to report income subsidies to city and county welfare authorities ...." Ibid., p. 268. During this same period, access to the contractor's files was also sought by the General Accounting Office and the U. S. Senate Finance Committee.

2 The current version of this protection provides that:

Neither the Secretary, nor any other officer or employee of the Department of Commerce or bureau or agency thereof, may...(1) use the information furnished under the provisions of this title for any purpose other than the statistical purposes for which it is supplied; or (2) make any publication whereby the data furnished by any particular establishment or individual under this title can be identified; or (3) permit anyone other than the sworn officers and employees of the Department or bureau or agency thereof to examine the individual reports....13 U.S.C. 9(a).

3 The New Jersey case is not unique. At least two other incidents of a similar nature have been reported. See John Walsh, "Anti-poverty R&D: Chicago Debacle Suggests Pitfalls Facing OEO," Science, 165, 19 September 1969, pp. 1243-1245; and “Appeals Court Orders MD to Reveal Patients' Photos," Psychiatric News, VII:2, November 15, 1972, p. 1. The latter describes a pending court case involving the New York City Methadone Maintenance Treatment Program.

systems maintained exclusively for statistical reporting or research. We were mindful that in the mid-1960's a series of proposals to establish a national statistical data center had alerted the public to some of the dangers inherent in computer-based record-keeping operations. We also knew that the Freedom of Information Act contains no clear statement of Congressional intent with respect to the disclosure of individually identifiable data maintained for statistical reporting and research. We had assumed, however, that statisticalreporting and research data systems, by and large, would not contain data in personally identifiable form, and that if they did, the anonymity of individual data subjects would be protected by specific statutory safeguards. We were not prepared for the discovery that in many instances files used exclusively for statistical reporting and research do contain personally identifiable data, and that the data are often totally vulnerable to disclosure through legal process. This holds for data in Federal agency files as well as for data in the possession of State agencies and private research organizations:

Changes in social policy, which computer technology has to some extent facilitated, are in large part responsible for the existence of unprotected statistical-reporting and research files. Since the late 1950's, the Federal Government has been distributing increasingly large sums of money to the States on the basis of formulas that take account of special population characteristics. The recipient State governments, in turn, have been redistributing this money among their own political subdivisions, using grant-in-aid formulas that tend to generate new requirements for statistical data about people at nearly every level of government. Often coupled with these grants, moreover, have been planning requirements demanding highly detailed information about the populations of small geographic areas.

Program evaluation requirements, first levied on grant-in-aid recipients by Federal agencies and later explicitly written into some

4 Report of the Committee on the Preservation and Use of Economic Data to the Social Science Research Council, April 1965, reprinted as Appendix I in The Computer and Invasion of Privacy, Hearings before a Subcommittee of the Committee on Government Operations, U. S. House of Representatives, 89th Congress, 2d Session, July 26, 27, 28, 1966; Statistical Evaluation Report No. 6-Review of Proposal for a National Data Center, prepared by Edgar S. Dunn, Jr., also reprinted in The Computer and Invasion of Privacy as Appendix 2; and Report of the Task Force on the Storage of and Access to Government Statistics (Washington, D.C.: Bureau of the Budget), October 1966.

508-625 O-73-9

of the agencies' authorizing legislation, have been a further stimulus to the proliferation of statistical-reporting and research files containing data about people. From their initial emphasis on simple input accounting (how much was spent, by whom, for what purpose, on how many people, with which characteristics), evaluation studies have rapidly come to focus on measuring program effects.5 Because effects measurement usually requires before-and-after data on program participants, it has become necessary to preserve individual identities in evaluation research files. Interest in the specific events and processes that may account for changes in participant behavior over time has also grown along with interest in output measurement. Many of the factors that account for a participant's behavior are so subtle that they can only be isolated if records of people's movements and experiences are kept over an extended period.

A third factor that has enlarged the number of data files containing information about identifiable individuals is the broad support given to fundamental research in the social and biomedical sciences. In fact, files for research in these two areas may be the most numerous of all, and they exist in a variety of settings. Many such files are coming into the possession of government agencies as a consequence of contract arrangements that make agencies the proprietors of data generated in government-supported research and demonstration projects. Not all of these files contain information that identifies individual data subjects, but of those that do, the ones dealing with controversial social and political issues are particularly vulnerable to misuse in the absence of specific statutory safeguards.

The Need to Protect Data Subjects From Injury

Even at the Federal level there are few statutes that protect personal data in statistical-reporting and research files from unintended administrative or investigative uses. The Census Act, the

5 There is today a substantial evaluation research literature to which the interested reader can refer for a fuller account of how this new government-supported activity has developed. See, for example, Edward A. Suchman, Evaluative Research (New York: Russell Sage Foundation), 1967; Francis G. Caro, Readings in Evaluation Research (New York: Russell Sage Foundation, 1971; and Peter H. Rossi and Walter Williams (Eds.), Evaluating Social Programs: Theory, Practice, and Politics (New York and London: Seminar Press), 1972.

Public Health Service Act, and the Social Security Act are notable exceptions. Otherwise there is little to prevent anyone with enough time, money, and perseverance (to say nothing of someone who can issue or obtain a subpoena) from gaining access to a wealth of information about identifiable participants in surveys and experiments. This should not, and need not, be the case.

6

Social scientists and others whose research involves human subjects are vocal about the importance of being able to assure individuals that information they provide for statistical reporting and research will be held in strictest confidence and used only in ways that will not result in harm to them as individuals. Unless people get-and believe-such assurances, they will inevitably become either less willing or less reliable participants in surveys and experiments. Ideally, data subjects should also be told of the conditions under which they are being asked to provide information, and should be given an opportunity to refuse if they find those conditions unsatisfactory. It is often asserted, for example, that the decennial census (in which response is mandatory) is a feasible undertaking only because the public willingly co-operates, and that the public's cooperation is best obtained by explaining to respondents the uses to which the data will be put.

We believe the principle that no harm must come to an individual as a consequence of participating in a general knowledge-producing activity should be regarded as the essence of "use for statistical or research purposes only." Individual data subjects asked to provide data for statistical reporting and research should also be fully informed, in advance, of the known consequences for them of providing or not providing data. Survey respondents and participants in experiments and demonstration projects are largely dependent on what they are told by interviewers or by explanatory notes on forms. Hence, it is incumbent on the institution conducting or funding a statistical-reporting or research project to find out how vulnerable the data in its files are, and so to inform its data subjects. Finally, we believe that the best way to assure that individual data subjects will not be harmed is to extend to all personal data generated through statistical-reporting and research activities the

'See Chapter 6, "Privacy and Confidentiality," in Federal Statistics, the Report of the President's Commission on Federal Statistics (Washington, D.C.: U.S. Government Printing Office), 1971.

statutory protections that have been given to census data and certain classes of health and economic data collected and used in the public interest.

The Need for Freer Access to Data in Government Files

The obverse of the problem of data confidentiality is the need to make basic data more accessible for reuse or reanalysis by all qualified persons or institutions. Personal data systems for statistical reporting and research are largely in the hands of institutions that wield considerable power in our society. Hence, it is essential that data which help organizations to influence social policy and behavior be readily available for independent analysis.

The ubiquitous computer has increased both the quantity of data potentially available to users and the number of potential users. Unfortunately, however, the data dissemination capability of many funding and collecting institutions has not grown commensurately. Among the general purpose statistical operations of the Federal government, the Census Bureau has led the way in making data from standard statistical series easily available to users in a form that protects the anonymity of respondents. Other agencies, notably the National Center for Health Statistics, have followed suit." The Department of Health, Education and Welfare is currently preparing a guidebook of its "public use" data files.8

Laudable as these efforts are, it should be emphasized that they are being made, for the most part, by agencies or offices within agencies whose primary mission is statistical reporting and research. They do not address the problem of access to the statisticalreporting and research files that operating agencies develop in the course of evaluating programs or in adding to the general knowledge of program administrators. It is true, as noted earlier, that anyone with enough money, time, and perseverance can probably gain access to substantial amounts of data not generally available for public use. Yet the individual researcher, or the independent critical

7 'National Center for Health Statistics, Standardized Micro-Data Transcripts (Rockville, Md.: National Center for Health Statistics), December 1972.

8 Guidebook to the U.S. Department of Health, Education, and Welfare Computer Data Files, 1973 (forthcoming).

« PreviousContinue »