Several strains of legal, public policy, and ethical thought converge on these government-collected data which understandably lead to conflicts which have yet to be resolved adequately. These include: (1) the desire to protect individual and corporate privacy as defined above, (2) the desire to protect data subjects from the various risks associated with disclosure of identifiable information, once collected, (3) the need to facilitate the search for truth through research and evaluation; and (4) the need for the public to understand how government decisions are made as evidenced by the Freedom of Information Act (FOIA) and the Government in the Sunshine Act (Sunshine Act).

records are often used. The quality which separates administrative activities from the statistical and research activities under consideration in this chapter is how the data are used. Administrative activities use information for determining eligibility for benefits or for investigation or regulation of particular individuals or legal entities. Such information is “microaction” in nature. In contrast, statistical and research activities are defined as those activities whose sole purpose is advancement of the state of our knowledge. They do not include the use of data in identifiable form to make determinations about particular individuals. This definition is similar to the definition of statistical research and reporting records in the Privacy Act.

Definition of Statistical and Research Data

The Privacy Protection Study Commission used the term "research" to refer to any systematic, objective process designed to obtain new knowledge, regardless of whether it is “pure” (aimed at deriving general principles) or “applied” (aimed at solving a specific problem or at determining policy). “Statistics” referred both to the data obtained through enumeration and measurement and to the use of mathematical methods for dealing with data so obtained. Statistical methods can be descriptive, that is, any treatment designed to summarize or describe important features of data, or inferential, that is, techniques for arriving at generalizations that go beyond the sample being analyzed.

For the purposes of this chapter, the scope of statistical and research data is defined by the use to which the information is put, rather than by the character of the information. These activities include gathering data for the purpose of estimating aggregates or providing cross-tabulations on economic or social characteristics, the generation of microdata sets on anonymous individual units for analysis, and the development of longitudinal reporting panels. Research activities include biomedical research conducted in clinics or hospitals, socioeconomic experimentation, and evaluation studies of the effect of different approaches to providing Government services. One common thread of these activities is the advancement of knowledge, whether it be of a general nature or related to a specific problem or Federal program.

Statistical and research activities are frequently carried out with data collected directly from respondents specifically for research purposes. However, in many cases, data collected originally in connection with the administration of Federal programs such as work training programs or the collection of social security taxes may be essential to a statistical or research program. Medical treatment

Principles of Confidentiality

The protection of confidentiality for statistical and research data involves several basic ingredients:

The prohibition of mandatory disclosure of identifiable data pursuant to a subpoena or other compulsory legal process or under the provisions of the Freedom of Information Act; The prohibition of other disclosures to the public of identifiable data in publication or otherwise by an agency; and The prohibition of use of identifiable data for any purpose other than one which is purely statistical or research in nature, and, particularly, from uses which would adversely affect any particular respondent's rights, benefits, or privileges.

Prohibition of Mandatory Disclosure.Compulsory legal process has been invoked in only a few cases dealing with statistical and research data. One of the most famous of these was the 1961 order requiring the St. Regis Paper Company to deliver a copy of a Census Bureau form which it had retained in its files to the Federal Trade Commission. In a swift reaction, Congress amended the Census law to protect copies of Census documents retained in respondent's files from compulsory legal process.

In a case involving an income maintenance experiment in New Jersey, files of some families were sought by a county prosecutor to attempt to discover if the families were defrauding the county welfare department by not reporting income received from the experiment. The records were not protected by

Prohibition of Voluntary Disclosure.- Protection of data from voluntary disclosure in identifiable form is an elementary aspect of any statistical or research activity in which identifiable data are used. Several steps may be appropriate to achieve this protection. Agency personnel should be required to take an oath to protect the identity of respondents. Criminal penalties should be assessed for knowing and willful disclosure. Civil remedies might be made available to any respondent who is harmed by the disclosure. Research data should be scrubbed of identifiers as soon as possible to prevent disclosure, and all agencies should be sensitive to the possibility of disclosing data through publications. Disclosure procedures, such as those in use by the Census Bureau, for example, should be developed so that data pertaining to a particular respondent cannot be linked together, even by comparison of several tables.

The integrity of the statistical and research activities of the Government might be seriously damaged by an inappropriate disclosure. The Federal Committee on Statistical Methodology chaired by the Office of Federal Statistical Policy and Standards is exploring disclosure avoidance techniques in publications and in the dissemination of public use microdata tapes, as well as statistical matching with and without the use of identifiable data. (For further discussion of disclosure policy and procedures, see Statistical Policy Working Paper 2, Reports on Statistical Disclosure and Disclosure-Avoidance Techniques, published by the Office of Federal Statistical Policy and Standards.) In the words of the DHEW Committee: “The protection should be limited to data identifiable with, or traceable to, specific individuals. When data are released in statistical form, reasonable precautions to protect against “statistical disclosure" should be considered to fulfill the obligations not to disclose data that can be traced to specific individuals.”


statute from subpoena and the researcher came close to going to jail for contempt of court.

In another instance, People v. Newman, a subpoena was issued for photographs of participants in a New York methadone maintenance program. After many months and the threat of contempt of court against the researcher, the clinic administering the program, on appeal, was allowed to withhold the photographs on the grounds that the identity of the patients was protected by the research privilege which was granted to the program under the Public Health Service Act.

The FOIA provides that persons may require disclosure of agency documents except that an agency may withhold documents which fall within certain exemptions. These exemptions include, among others, matters which are "specifically exempted from disclosure by statute,” “trade secrets and commercial or financial information obtained from a person which is privileged or confidential”; and "personnel and medical files and similar files the disclosure of which would constitute a clearly unwarranted invasion of personal privacy." Recently, in interpreting the last exemption, courts have tended to consider whether the need for the data, as balanced against the extent of the invasion of personal privacy, warrants the particular disclosure. It is clear that a pledge of confidentiality made to a respondent is not sufficient, in and of itself, to protect data not expressly exempted from disclosure by another statute. The FOIA has not been used very often to force disclosure of statistical or research data in identifiable form, and those few disclosures have involved corporate data. Disclosure of identifiable data may prove to be a greater problem in the future because of the recent FOIA amendments which will speed up agency processing of requests for information and the Sunshine Act which substantially limits the applicability of the statutory exemption from FOIA disclosure.

Prohibition of Misuse.—The final element in a program to protect the confidentiality of statistical or research data is to prevent their use in identifiable form for making determinations which affect a particular respondent. Agencies often promise that data pertaining to a respondent will be used only for statistical or research purposes, but some agencies do not have a legal basis to carry out and enforce this pledge.

It is important to consider the organizational factor in protecting these data from misuse. Several agencies of the Federal Government perform only statistical or research functions, and their product takes the form of published aggregates, tabulations,

It is argued in this chapter that identifiable data used for statistical or research purposes should be specifically exempt from disclosure by statute to protect the identity of respondents and to assure accurate statistical and research results. The DHEW Secretary's Advisory Committee on Automated Personal Data Systems agreed in principle and cautioned that "the data to be protected should be limited to those used exclusively for statistical reporting or research. Thus, the protection would apply to statistical reporting and research data derived from administrative records, and kept apart from them, but not to the administrative records themselves.

research reports, and other documents which do not reveal identifiable statistical data or the identity of any research subject. Any agency which has this limited function can more easily assure the confidentiality of data collected or received from another agency under a specific law which protects all such data which it collects or maintains.

administrative data sets to eliminate the possibility of their use in the administrative process. In some cases the inability of a substantive program administrator to have access to identifiable statistical or research data may lead to a duplicate data collection effort. This is the price one may have to pay to achieve tight confidentiality for statistical or research data.

When statistical or research activities are undertaken by agencies which have other program responsibilities, it is often more difficult to keep the data from being used in the administration of a program. (Some agencies use the word "confidential” to include data used internally for regulatory purposes—such data should be described with another word to avoid confusion with the usage proposed in this chapter.) This may be especially difficult when the data are originally gathered in connection with regulatory and program administration functions. Often such data are collected on a mandatory basis to determine eligibility for a benefit or to see whether a particular person has complied with a law. As mentioned above, programrelated information may also be useful for statistical, research, or program evaluation activities. These activities may involve the collection of additional statistical data, perhaps from a sample of program participants. In these cases, the statistical data derived from the program may be inextricably merged with other directly collected data in the records maintained by the researcher, evaluator, or statistician. A major threat to the confidentiality of the data is that data which are maintained for statistical or research purposes may be viewed as a prime source of information by a benefits program administrator or by a regulatory arm of an agency. The one way flow of administrative data into archives which is advocated in the Framework would eliminate this potential risk.

It is clear that the integrity of statistical or research programs of the Federal Government would be widely questioned and their effectiveness would be damaged if data from those programs were used in identifiable form to make administrative or regulatory decisions about particular respondents. Hence, it is important to keep identifiable statistical or research data separate from program administration or regulatory data, even if the former is derived from the latter. One way to achieve organizational integrity for statistical or research activities is hrough the establishment of a functionally separate research and statistical agency or unit within the administrative agency which is legally insulated from requests for identifiable data from other operating units. Where statistical or research data are derived in whole or in part from administrative data they should be maintained apart from the original

Sharing of Identifiable Data Under
Controlled Conditions for Statistical

or Research Purposes Only One of the most difficult questions facing the Federal Government is whether there are any circumstances under which identifiable data in the hands of one agency should be disclosed to another person or agency, not connected with the original data collection process, for statistical or research purposes. There are strong arguments that this should be done only after securing the respondent's informed consent for each such release of identifiable data to a particular recipient for a particular stated purpose; consent may be obtained when the data are originally collected or at a later stage, but in any event it must be given before the disclosure is made. These arguments rest on the definitions of privacy cited above and seek to protect the data subject's autonomy. This is the approach taken by the HEW Secretary's Advisory Committee:

The agency should assure that no use of individually identifiable data is made that is not within the stated purposes of the system as reasonably understood by the individual, unless the informed consent of the individual has been explicitly obtained.

While the Privacy Act of 1974 starts with this basic principle, the many exceptions to informed consent incorporated in that Act seriously erode the concept. The Census Bureau law, however, is more rigorous, since it does not permit the release of identifiable data without consent, except to Bureau employees or to other sworn agents.'

A different approach is recommended in this chapter. There are some situations in which the sharing of identifiable data for statistical or research purposes is a good public policy and may be an integral factor in advancing the state of knowledge by contributing to the success of other statistical programs. Situations leading to sharing of identifiable data among agencies include the development of

statistical samples which would only be available source data may be sensitive or commercially from other agencies' data and the matching of data valuable, they were not in the first instance statistical from two or more sources to develop a richer data or research records. base for statistical cross-tabulations. These data may be used to check the accuracy of a data source, for

The challenge is to determine under what

circumstances and conditions identifiable statistical retrospective studies, to analyze large bodies of data

or research records should be disclosed to agencies from another agency, and to ensure accuracy,

other than the collecting agency. timeliness and consistency of major statistical or research reports. This is the basic philosophy To answer this challenge one must be clear about embodied in the Federal Reports Act of 1942. In what is to be protected and at what cost that most cases the sharing of data results in a reduction

protection is being secured. To the extent that laws of (1) burden on respondents and (2) cost to the and regulations are effective in this regard, one could taxpayers of performing a particular study. This protect the autonomy of a data subject, by requiring philosophy is also embodied in the Federal Reports advance informed consent for each such disclosure. Act. In some cases a study literally could not be The price of that approach, however, is great, leading performed at all without access to another agency's to biased data, increased public expenditure, and the identifiable data.

failure or impossibility of some valuable statistical

and research studies. The proposition that confidentiality can be protected by entirely prohibiting interagency On the other hand, one can concentrate on transfers of identifiable data unless explicit consent is protecting the data subject from risk of adverse obtained would eliminate many valuable studies." exposure or harm by requiring that the recipient of Often the securing of informed consent prior to identifiable statistical or research data be subject to disclosure is impossible. In some cases, this is because the same legal and procedural constraints against of the large scale of the source program. In other disclosure or misuse of the data and afforded the cases, those who consent would be a self-selected and same legal protection from compulsory disclosure therefore biased sample of respondents. Finally, pursuant to a subpoena or FOIA request as the securing informed consent after the data collection is

agency which originally collected the data. It is this known to be a very difficult and expensive process. alternative approach which is taken here, in the

interest of an effective Federal statistical and research It should be emphasized that the extent of necessary sharing of identifiable data for statistical or

system coupled with adequate protection of the research purposes is not great. The proposed

interests of data subjects. interagency use of the Census Bureau's Standard

There is, however, a significant difference of Statistical Establishment List (SSEL) of enterprises opinion as to the best method of setting conditions and establishments is one example of such use. and constraints on data sharing or disclosure. Any Matching of Social Security records with Census rec method must serve both purposes of providing ords is another. Although some work has been done effective protection of data subjects while, at the on development of techniques for the use of same time, assuring sufficient flexibility to permit the nonidentifiable data or statistical matching in lieu of implementation of socially useful programs intransfer of identifiable data, no significant projects volving identifiable data. Before proposing a possible using these methods as an acceptable substitute for solution to this problem it is necessary to examine the exact matching have been identified.

present legal situation. Most instances of transfers of identifiable records to other agencies or to private researchers or contractors involve program administration data which have been found useful for statistical or

Current Laws Affecting Confidentiality-A research purposes. Thus Census has put tax records

Partial Summary on individuals and businesses to good use without harming the taxpayers. Drug abuse research has been

Many existing laws address the problem of data conducted using military discharge records without a

confidentiality. This section presents in summary hint of trouble. Identifiable hospital records have

form a catalog of some important examples. It is not

intended to be exhaustive, but it is descriptive of the been used for epidemiological research. While these

models which are presently in place. Some are

For an excellent statement of this proposition, see DHEW "Confidentiality of Alcohol and Drug Abuse Patient Records,"

Federal Register, (Vol. 40 No. 127, July 1, 1975, Part IV) pp.

20536-20537.

They are presented in that order.

General Laws

efforts wherever possible. In this connection, the General Rule on Disclosure of Confidential

sharing of data between Federal agencies has been

viewed as a way to reduce the need for agencies to Information Applicable to All Agencies.-One law (18

collect the same information more than once. U.S.C. 1905) imposes penalties on, and removal from office of, any Federal official or employee who "publishes, divulges, discloses or makes known in

The circumstances specified in the act for sharing

of data between agencies are limited, however, since any manner or to any extent not authorized by law ... confidential statistical data." It provides insufficient

information can only be released to another agency if

(a) the information is released in nonidentifiable sumprotection for statistical or research information, however. For example, many disclosures to others

mary or tabular form; (b) the information has not, at within an agency performing regulatory,

the time of collection, been declared by the collecting investigative, or substantive program administration

agency or a superior authority as being confidential; functions are authorized by law. In addition,

(c) the respondent has consented to the release; or (d) information which is discoverable in civil suits under

the recipient agency has a mandatory authority, with Federal Rules of Civil Procedures may not be

criminal penalties for nonresponse, to collect the withheld under this provision. It does not prevent

same data. Under the act the Director of the Office of disclosure required or permitted under the Freedom

Management and Budget is authorized to require any of Information Act. This law does, however, apply to

Federal agency, except the IRS or bank supervisory unauthorized disclosures of information, and forms a

agencies, to transfer any data which it collects to basic minimum standard to be met. The specific

another agency under these conditions. Some such wording of the law is as follows:

transfers take place without OMB intervention.

While the extent of interagency data sharing under Whoever, being an officer or employee of the this act is not known, it is probably minimal. United States or of any department or agency thereof, publishes, divulges, discloses, or makes

Freedom of Information Act (FOIA) and the known in any manner or to any extent not

Government in the Sunshine Act (Sunshine Act).authorized by law any information coming to him

Statistical agencies concerned with the protection of in the course of his employment or official duties

data must also consider the impact of the Freedom of or by reason of any examination or investigation

Information Act (FOIA) (5 U.S.C. 552) and the made by, or return, report or record made to or

Sunshine Act (5 U.S.C. 552b). filed with, such department or agency or officer or employee thereof, which information concerns or relates to the trade secrets, processes, operations,

The purpose of the FOIA is basically to permit the

public access to information upon request as a check style of work, or apparatus, or to the identity,

on the process of Government. Even within this confidential statistical data, amount or source of

environment, however, Congress recognized the any income, profits, losses, or expenditures of any

wisdom of maintaining some information person, firm, partnership, corporation, or

confidentially. For our purposes, the relevant association; or permits any income return or copy

passages, which appear in subsection (b) as amended thereof or any book containing any abstract or

by the Sunshine Act, permit agencies to withhold particulars thereof to be seen or examined by any

matters which areperson except as provided by law; shall be fined not more than $1,000, or imprisoned not more than (3) specifically exempted from disclosure by one year, or both; and shall be removed from office

statute (other than section 552b of this title) or employment.

provided that such statute (A) requires that

the matters be with held from the public in Federal Reports Act.—The Federal Statistical

such a manner as to leave no discretion on the System has long been sensitive to the importance of

issue, or (B) establishes particular criteria for confidential treatment of statistical information. For

withholding or refers to particular types of example, the Federal Reports Act of 1942 (44 U.S.C.

matters to be withheld: 3501-12) specifically addresses the importance of confidentiality in data sharing for statistical (4) trade secrets and commercial or financial purposes.

information obtained from a person and

privileged or confidential;... The purpose of the Federal Reports Act is to reduce the Federal reporting burden on the public by (6) personnel and medical files and similar files the elimination of unnecessary duplication of Federal

the disclosure of which would constitute a requests for information from the public and by en

clearly unwarranted invasion of personal couraging the coordination of Federal data collection


