Page images
PDF
EPUB

Chapter 21. CONFIDENTIALITY OF STATISTICAL AND RESEARCH DATA

Introduction and Overview

The concern for the protection of privacy of individuals and for maintaining secret proprietary information pertaining to business establishments and other legal institutions is traditional and well known. The recent debates in Congress leading to the passage of the Privacy Act of 1974 and the voluminous literature on the subject identify the Federal Government as a major threat to that privacy. Indeed, the Government's need for information for policy determination, program evaluation, and regulation of many aspects of society has led to an increasing need for information and therefore to increasing conflicts with the concepts of privacy, both for individuals and for legal persons. It has also focused more attention on the concept of confidentiality of information than ever before.

It is important that the two concepts of privacy and confidentiality be distinguished at the outset. Privacy, on the one hand, has been variously defined as: (1) the right to be left alone, to be spared from unauthorized oversight and observation, and from searching inquiries about oneself and one's business;' (2) the ability to control the use of information about oneself, whether to give it free circulation, limited circulation, or no circulation at all; and (3) the right to participate in a meaningful way in decisions about what information will be collected and how that information will be used.' The concept of autonomy is also used to describe this right. In one view, these definitions imply that a data subject must be fully informed about all uses of data sought and be given the right to withhold consent from any or all such uses. In the extreme, of course, the Government should not collect any information at all.

Confidentiality, on the other hand, involves the conditions of use and disclosure of data once it is collected. The Government's needs for information about individuals, businesses, and institutions fall into many different categories including counting the population, as mandated in the Constitution; providing benefits such as welfare, student loans, or medical insurance; collecting taxes; regulating industry; enforcing laws; evaluating programs; and advancing the state of knowledge through statistics and research. Hence, the Government collects or causes to be collected great amounts of data, some of it highly personal or capable of inflicting great competitive injury if made public. The challenge posed by the dual concerns for privacy and the enhancement of knowledge is, therefore, to refrain from collecting unnecessary information and to maintain the necessary degree of confidentiality for that which is collected.

In addition, in many cases the success of a statistical inquiry leading to an enhancement of knowledge relies on a pledge of confidential treatment of the data. Margaret Martin has succinctly stated this proposition as follows:

Even when responses to requests for information are required by law, the success of a statistical program depends in large measure on the willing cooperation of respondents. Respondents who understand the purpose of the inquiry, who sympathize with the intended use of the information, and who believe that providing the government with the requested information will not harm them are much more likely to answer truthfully and with a minimum of effort on the part of the data collection agency. One element in enlisting such cooperation is the assurance of harmlessness to the respondent, and one of the most common methods for making such assurance in statistical data collection is the provision for keeping the replies confidential."

Webster's Third New International Dictionary.

'Alan F. Westin, Privacy and Freedom (New York: Atheneum, 1967), Part One.

'DHEW Secretary's Advisory Committee on Automated Personal Data Systems, Records, Computers and the Rights of Citizens (DHEW Publication No. (OS) 73-94, July 1973), 41.

'Margaret Martin, "Statistical Legislation and Confidentiality Issues," International Statistical Review, (Vol. 42, No. 3, December 1974) p. 265.

Several strains of legal, public policy, and ethical thought converge on these government-collected data which understandably lead to conflicts which have yet to be resolved adequately. These include: (1) the desire to protect individual and corporate privacy as defined above, (2) the desire to protect data subjects from the various risks associated with disclosure of identifiable information, once collected, (3) the need to facilitate the search for truth through research and evaluation; and (4) the need for the public to understand how government decisions are made as evidenced by the Freedom of Information Act (FOIA) and the Government in the Sunshine Act (Sunshine Act).

Definition of Statistical and Research Data

The Privacy Protection Study Commission used the term "research" to refer to any systematic, objective process designed to obtain new knowledge, regardless of whether it is "pure" (aimed at deriving general principles) or "applied" (aimed at solving a specific problem or at determining policy). "Statistics" referred both to the data obtained through enumeration and measurement and to the use of mathematical methods for dealing with data so obtained. Statistical methods can be descriptive, that is, any treatment designed to summarize or describe important features of data, or inferential, that is, techniques for arriving at generalizations that go beyond the sample being analyzed.

For the purposes of this chapter, the scope of statistical and research data is defined by the use to which the information is put, rather than by the character of the information. These activities include gathering data for the purpose of estimating aggregates or providing cross-tabulations on economic or social characteristics, the generation of microdata sets on anonymous individual units for analysis, and the development of longitudinal reporting panels. Research activities include biomedical research conducted in clinics or hospitals, socioeconomic experimentation, and evaluation studies of the effect of different approaches to providing Government services. One common thread of these activities is the advancement of knowledge, whether it be of a general nature or related to a specific problem or Federal program.

Statistical and research activities are frequently carried out with data collected directly from respondents specifically for research purposes. However, in many cases, data collected originally in connection with the administration of Federal programs such as work training programs or the collection of social security taxes may be essential to a statistical or research program. Medical treatment

records are often used. The quality which separates administrative activities from the statistical and research activities under consideration in this chapter is how the data are used. Administrative activities use information for determining eligibility for benefits or for investigation or regulation of particular individuals or legal entities. Such information is "microaction" in nature." In contrast, statistical and research activities are defined as those activities whose sole purpose is advancement of the state of our knowledge. They do not include the use of data in identifiable form to make determinations about particular individuals. This definition is similar to the definition of statistical research and reporting records in the Privacy Act.

Principles of Confidentiality

The protection of confidentiality for statistical and research data involves several basic ingredients:

The prohibition of mandatory disclosure of identifiable data pursuant to a subpoena or other compulsory legal process or under the provisions of the Freedom of Information Act;

The prohibition of other disclosures to the public of identifiable data in publication or otherwise by an agency; and

The prohibition of use of identifiable data for any purpose other than one which is purely statistical or research in nature, and, particularly, from uses which would adversely affect any particular respondent's rights, benefits, or privileges.

Prohibition of Mandatory Disclosure.— Compulsory legal process has been invoked in only a few cases dealing with statistical and research data. One of the most famous of these was the 1961 order requiring the St. Regis Paper Company to deliver a copy of a Census Bureau form which it had retained in its files to the Federal Trade Commission. In a swift reaction, Congress amended the Census law to protect copies of Census documents retained in respondent's files from compulsory legal process.

In a case involving an income maintenance experiment in New Jersey, files of some families were sought by a county prosecutor to attempt to discover if the families were defrauding the county welfare department by not reporting income received from the experiment. The records were not protected by

'Walt Simmons, "Issues Regarding Confidentiality of Data in the Cooperative Health Statistics System," Proceedings of the Workshop on Privacy and Confidentiality, (Cooperative Health Statistics System, National Center for Health Statistics, March 1976), pp. 95-101.

statute from subpoena and the researcher came close to going to jail for contempt of court.

In another instance, People v. Newman, a subpoena was issued for photographs of participants in a New York methadone maintenance program. After many months and the threat of contempt of court against the researcher, the clinic administering the program, on appeal, was allowed to withhold the photographs on the grounds that the identity of the patients was protected by the research privilege which was granted to the program under the Public Health Service Act.

The FOIA provides that persons may require disclosure of agency documents except that an agency may withhold documents which fall within certain exemptions. These exemptions include, among others, matters which are "specifically exempted from disclosure by statute," "trade secrets and commercial or financial information obtained from a person which is privileged or confidential"; and "personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy.' Recently, in interpreting the last exemption, courts have tended to consider whether the need for the data, as balanced against the extent of the invasion of personal privacy, warrants the particular disclosure. It is clear that a pledge of confidentiality made to a respondent is not sufficient, in and of itself, to protect data not expressly exempted from disclosure by another statute. The FOIA has not been used very often to force disclosure of statistical or research data in identifiable form, and those few disclosures have involved corporate data. Disclosure of identifiable data may prove to be a greater problem in the future because of the recent FOIA amendments which will speed up agency processing of requests for information and the Sunshine Act which substantially limits the applicability of the statutory exemption from FOIA disclosure.

It is argued in this chapter that identifiable data used for statistical or research purposes should be specifically exempt from disclosure by statute to protect the identity of respondents and to assure accurate statistical and research results. The DHEW Secretary's Advisory Committee on Automated Personal Data Systems agreed in principle and cautioned that "the data to be protected should be limited to those used exclusively for statistical reporting or research. Thus, the protection would apply to statistical reporting and research data derived from administrative records, and kept apart from them, but not to the administrative records themselves."

'DHEW Secretary's Advisory Committee, op. cit., p. 103.

Prohibition of Voluntary Disclosure.-Protection of data from voluntary disclosure in identifiable form is an elementary aspect of any statistical or research activity in which identifiable data are used. Several steps may be appropriate to achieve this protection. Agency personnel should be required to take an oath to protect the identity of respondents. Criminal penalties should be assessed for knowing and willful disclosure. Civil remedies might be made available to any respondent who is harmed by the disclosure. Research data should be scrubbed of identifiers as soon as possible to prevent disclosure, and all agencies should be sensitive to the possibility of disclosing data through publications. Disclosure procedures, such as those in use by the Census Bureau, for example, should be developed so that data pertaining to a particular respondent cannot be linked together, even by comparison of several tables.

The integrity of the statistical and research activities of the Government might be seriously damaged by an inappropriate disclosure. The Federal Committee on Statistical Methodology chaired by the Office of Federal Statistical Policy and Standards is exploring disclosure avoidance techniques in publications and in the dissemination of public use microdata tapes, as well as statistical matching with and without the use of identifiable data. (For further discussion of disclosure policy and procedures, see Statistical Policy Working Paper 2, Reports on Statistical Disclosure and Disclosure-Avoidance Techniques, published by the Office of Federal Statistical Policy and Standards.) In the words of the DHEW Committee: "The protection should be limited to data identifiable with, or traceable to, specific individuals. When data are released in statistical form, reasonable precautions to protect against "statistical disclosure" should be considered to fulfill the obligations not to disclose data that can be traced to specific individuals.”

Prohibition of Misuse.-The final element in a program to protect the confidentiality of statistical or research data is to prevent their use in identifiable form for making determinations which affect a particular respondent. Agencies often promise that data pertaining to a respondent will be used only for statistical or research purposes, but some agencies do not have a legal basis to carry out and enforce this pledge.

It is important to consider the organizational factor in protecting these data from misuse. Several agencies of the Federal Government perform only statistical or research functions, and their product takes the form of published aggregates, tabulations,

'Ibid.

research reports, and other documents which do not reveal identifiable statistical data or the identity of any research subject. Any agency which has this limited function can more easily assure the confidentiality of data collected or received from another agency under a specific law which protects all such data which it collects or maintains.

When statistical or research activities are undertaken by agencies which have other program responsibilities, it is often more difficult to keep the data from being used in the administration of a program. (Some agencies use the word "confidential" to include data used internally for regulatory purposes such data should be described with another word to avoid confusion with the usage proposed in this chapter.) This may be especially difficult when the data are originally gathered in connection with regulatory and program administration functions. Often such data are collected on a mandatory basis to determine eligibility for a benefit or to see whether a particular person has complied with a law. As mentioned above, programrelated information may also be useful for statistical, research, or program evaluation activities. These activities may involve the collection of additional statistical data, perhaps from a sample of program participants. In these cases, the statistical data derived from the program may be inextricably merged with other directly collected data in the records maintained by the researcher, evaluator, or statistician. A major threat to the confidentiality of the data is that data which are maintained for statistical or research purposes may be viewed as a prime source of information by a benefits program administrator or by a regulatory arm of an agency. The one way flow of administrative data into archives which is advocated in the Framework would eliminate this potential risk.

It is clear that the integrity of statistical or research programs of the Federal Government would be widely questioned and their effectiveness would be damaged if data from those programs were used in identifiable form to make administrative or regulatory decisions about particular respondents. Hence, it is important to keep identifiable statistical or research data separate from program administration or regulatory data, even if the former is derived from the latter. One way to achieve organizational integrity for statistical or research activities is through the establishment of a functionally separate research and statistical agency or unit within the administrative agency which is legally insulated from requests for identifiable data from other operating units. Where statistical or research data are derived in whole or in part from administrative data they should be maintained apart from the original

administrative data sets to eliminate the possibility of their use in the administrative process. In some cases the inability of a substantive program administrator to have access to identifiable statistical or research data may lead to a duplicate data collection effort. This is the price one may have to pay to achieve tight confidentiality for statistical or research data.

Sharing of Identifiable Data Under
Controlled Conditions for Statistical

or Research Purposes Only

One of the most difficult questions facing the Federal Government is whether there are any circumstances under which identifiable data in the hands of one agency should be disclosed to another person or agency, not connected with the original data collection process, for statistical or research purposes. There are strong arguments that this should be done only after securing the respondent's informed consent for each such release of identifiable data to a particular recipient for a particular stated purpose; consent may be obtained when the data are originally collected or at a later stage, but in any event it must be given before the disclosure is made. These arguments rest on the definitions of privacy cited above and seek to protect the data subject's autonomy. This is the approach taken by the HEW Secretary's Advisory Committee:

The agency should assure that no use of individually identifiable data is made that is not within the stated purposes of the system as reasonably understood by the individual, unless the informed consent of the individual has been explicitly obtained.

While the Privacy Act of 1974 starts with this basic principle, the many exceptions to informed consent incorporated in that Act seriously erode the concept. The Census Bureau law, however, is more rigorous, since it does not permit the release of identifiable data without consent, except to Bureau employees or to other sworn agents.'

A different approach is recommended in this chapter. There are some situations in which the sharing of identifiable data for statistical or research purposes is a good public policy and may be an integral factor in advancing the state of knowledge by contributing to the success of other statistical programs. Situations leading to sharing of identifiable data among agencies include the development of

Ibid., p. 101.

'Census can release identifiable data to an individual who supplied it or to their heirs, largely for the purpose of verifying age to establish eligibility for a benefit (13 U.S.C. 8).

statistical samples which would only be available from other agencies' data and the matching of data from two or more sources to develop a richer data base for statistical cross-tabulations. These data may be used to check the accuracy of a data source, for retrospective studies, to analyze large bodies of data from another agency, and to ensure accuracy, timeliness and consistency of major statistical or research reports. This is the basic philosophy embodied in the Federal Reports Act of 1942. In most cases the sharing of data results in a reduction of (1) burden on respondents and (2) cost to the taxpayers of performing a particular study. This philosophy is also embodied in the Federal Reports Act. In some cases a study literally could not be performed at all without access to another agency's identifiable data.

10

The proposition that confidentiality can be protected by entirely prohibiting interagency transfers of identifiable data unless explicit consent is obtained would eliminate many valuable studies." Often the securing of informed consent prior to disclosure is impossible. In some cases, this is because of the large scale of the source program. In other cases, those who consent would be a self-selected and therefore biased sample of respondents. Finally, securing informed consent after the data collection is known to be a very difficult and expensive process.

It should be emphasized that the extent of necessary sharing of identifiable data for statistical or research purposes is not great. The proposed interagency use of the Census Bureau's Standard Statistical Establishment List (SSEL) of enterprises and establishments is one example of such use. Matching of Social Security records with Census records is another. Although some work has been done on development of techniques for the use of nonidentifiable data or statistical matching in lieu of transfer of identifiable data, no significant projects using these methods as an acceptable substitute for exact matching have been identified.

Most instances of transfers of identifiable records to other agencies or to private researchers or contractors involve program administration data which have been found useful for statistical or research purposes. Thus Census has put tax records on individuals and businesses to good use without harming the taxpayers. Drug abuse research has been conducted using military discharge records without a hint of trouble. Identifiable hospital records have been used for epidemiological research. While these

For an excellent statement of this proposition, see DHEW "Confidentiality of Alcohol and Drug Abuse Patient Records," Federal Register, (Vol. 40 No. 127, July 1, 1975, Part IV) pp. 20536-20537.

source data may be sensitive or commercially valuable, they were not in the first instance statistical or research records.

The challenge is to determine under what circumstances and conditions identifiable statistical or research records should be disclosed to agencies other than the collecting agency.

To answer this challenge one must be clear about what is to be protected and at what cost that protection is being secured. To the extent that laws and regulations are effective in this regard, one could protect the autonomy of a data subject, by requiring advance informed consent for each such disclosure. The price of that approach, however, is great, leading to biased data, increased public expenditure, and the failure or impossibility of some valuable statistical and research studies.

On the other hand, one can concentrate on protecting the data subject from risk of adverse exposure or harm by requiring that the recipient of identifiable statistical or research data be subject to the same legal and procedural constraints against disclosure or misuse of the data and afforded the same legal protection from compulsory disclosure pursuant to a subpoena or FOIA request as the agency which originally collected the data. It is this alternative approach which is taken here, in the interest of an effective Federal statistical and research system coupled with adequate protection of the interests of data subjects.

There is, however, a significant difference of opinion as to the best method of setting conditions and constraints on data sharing or disclosure. Any method must serve both purposes of providing effective protection of data subjects while, at the same time, assuring sufficient flexibility to permit the implementation of socially useful programs involving identifiable data. Before proposing a possible solution to this problem it is necessary to examine the present legal situation.

Current Laws Affecting Confidentiality-A Partial Summary

Many existing laws address the problem of data confidentiality. This section presents in summary form a catalog of some important examples. It is not intended to be exhaustive, but it is descriptive of the models which are presently in place. Some are general, applying to many or all agencies, some apply only to specific agencies, and a few apply to particular types of information wherever maintained. They are presented in that order.

« PreviousContinue »