Page images
PDF
EPUB

Beginning this morning, we intend to analyze guidelines for safeguarding existing records; we will fully explore the role of the computer, with an emphasis on its future capabilities. We will attempt to draw a balance between individual privacy and computerized efficiency. In short, we will answer the request of the Kaysen Committee for procedures which "will protect confidentiality and insure the privacy of the individual."

But, as Dr. Kaysen correctly pointed out, this task is not for the Congress alone. Agencies and departments of the Federal Government must take stock of the various types of information contained in their own files before they can even consider consolidation into any data bank. Shortly after our last hearing, we began to realize that no one in Government really knew how much was stored on individual citizens. Accordingly, I sent a questionnaire to every Federal Department and agency asking them to list this information. The results of this survey have just been tabulated by the Census Bureau and they are to be commended for their excellent cooperation.

Let me briefly run down some of the immediate highlights of the subcommittee survey. First, the Government keeps files on just about every imaginable bit of information on an individual's life from the cradle to the grave. And the number of files is enormous. For example, Government reported that our names alone appeared in the files 2,800 million times. Our social security numbers are listed 1,500 million times. Other figures include: police records-264,500,000; medical history-342 million; and psychiatric history-279 million. With those type figures, they give you some need for psychiatric care.

Of course, these figures are somewhat meaningless since we do not know how many individuals are involved; every time we fill out some Government form, these numbers are increased. But what is of concern to us, however, are the following discoveries: many agencies require individuals to divulge personal information and yet give no pledge and are under no requirements to keep this information confidential. Included in this category are: court actions or involvements19,253,000; security reports-17,693,000; psychiatric history-107,000. We have just received these statistics, and plan to study them in detail. But even from our preliminary analysis, it seems clear that many of our Government agencies must put their own house in order before rushing ahead with data bank plans.

The Chair is glad to note the distinguished Senator from South Carolina, a member of the committee, is present this morning.

Senator, would you have any statement or would you care to make any statement at the opening of the hearing?

Senator THURMOND. No, sir. I am very much interested in the subject and shall be pleased to cooperate with the distinguished Chairman. Senator LONG. Thank you, Senator. You always have, and we certainly look forward to having you with us in these hearings.

Our first witness this morning is Dr. Carl Kaysen, director, Institute for Advanced Study, Princeton University. The doctor, I understand, is at the table.

For the record, will you state your name and your official position and, I believe, you have a prepared statement.

STATEMENT OF CARL KAYSEN, DIRECTOR, INSTITUTE FOR ADVANCED STUDY, PRINCETON UNIVERSITY, PRINCETON, N.J.

Dr. KAYSEN. I do, Senator, and if you like I will read it or should I just enter it into the record and summarize it?

Senator LONG. Go ahead and read it if you care to or handle it whatever way you think will best make your presentation to the subcommittee.

Dr. KAYSEN. Thank you. I will read it and see if I can skip a little. My name is Carl Kaysen. I live at 97 Olden Lane, Princeton, N.J. I am director of the Institute for Advanced Study at Princeton. If I may, sir, off the record, observe it is not part of Princeton University. By profession I am an economist, and it is in this capacity that I undertook the responsibility of being chairman of the task force on storage and access to Government statistics, that reported to the Director of the Budget. At the time I did so last year I was littauer professor of political economy and associate dean of the graduate school of public administration at Harvard University.

The purpose of the task force was to examine a problem in Government organization and operation which the members of the committee thought was of importance to the Government and to the public, looking at the problem from a perspective which most of us on the committee shared as users of Government statistics. As economists we are aware that both the intellectual development of economics and its practical success have depended greatly on the large body of quantitative information on the whole range of economic activity that is publicly available in modern, democratic states. Much of this material is the byproduct of regulatory, administrative, and revenue-raising activities of government, and its public availability reflects our democratic ethos. In the United States there is a central core of demographic, economic, and social information that is collected, organized, and published by the Census Bureau in response to both governmental and public demands for information, rather than simply as the reflex of other governmental activities. Over time, and especially in the last three or four decades, there has been a continuing improvement in the coverage, consistency, and quality of these data that has in great part resulted from the containing efforts of social scientists and statisticians both within and without the Government. Without these improvements in the stock of basic quantitative information, our recent success in the application of sophisticated economic analyses to problems of public policy would have been impossible. We were moved by professional concern for the quality and usability of the enormous body of Government data to take on what they thought to be a necessary, important, and totally unglamorous task. I think we turned out to be wrong about this last part.

The central problem which the task force addressed was the consequences of the trend toward increasing decentralization in the Federal statistical system at a time when the demand for more and more detailed quantitative information was growing rapidly. Currently, 21 agencies of Government have significant statistical programs. The largest four of these the Census, the Bureau of Labor Statistics, the

Statistical Reporting Service, and Economic Research Service of the Department of Agriculture-account for about 60 percent of a total Federal statistical budget which is currently on the order of $125 million a year. A decade ago, the largest four agencies accounted for 71 percent of a much smaller budget. By 1970 the total statistical budget of the Federal Government will probably exceed $200 million and, yet, unless we do something about it, decentralization will increase further. Yet, it already has been clear for some time that the Federal statistical system was too decentralized to function effectively and efficiently.

The committee proposed a National Data Center as a way to deal with the problem of effective use of available information. This proposal came under the scrutiny of the Subcommittee on the Invasion of Privacy of the Government Operations Committee of the House in a series of hearings in the summer of 1966. The hearings in turn generated a great deal of press comment. These hearings and the press comment together raised the question as to whether the proposed Center was a threat to personal privacy, and might even lead to a greatly increased intrusion of government into the life of the individual There is no question that a large-scale centralized data system which had no inhibitions on the information which it collected and no restraint on what it made public or how it made information available to other parts of the Government might indeed constitute a serious threat to privacy and liberty.

The crucial questions, of course, are what information would be put into the data center, and how access to it would be controlled. In the words of the task force report, the

Center would assemble in a single facility all large-scale systematic bodies of demographic, economic and social data generated by the present data-collection or administrative processes of the Federal Government . . . integrate the data to the maximum feasible extent, and in such a way as to preserve as much as possible of the original information content of the whole body of records, and provide ready access to the information, within the laws governing disclosure, to all users in the Government, and where appropriate to qualified users outside the Government on suitably compensatory terms. (Report, pp. 17-18.)

The phrase "large-scale systematic bodies of demographic, economic, and social data" translates, in more concrete terms, into the existing bodies of data collected by Census, the Bureau of Labor Statistics, the Department of Agriculture, the National Center for Health Statistics, the Office of Education, and so on. It also includes the large bodies of data generated as a byproduct of the administration of the Federal income tax and social security systems. It does not include police dossiers from the FBI, personnel records of the Civil Service Commission or the individual Government agencies or personnel records of the armed services, and other dossier information, none of which fits what is meant by the phrase "large-scale, systematic bodies of social, economic, and demograhpic data.'

[ocr errors]

For the data center to achieve its intended purposes, the material in it must identify individual respondents in some way, by social security number, for individuals, or an analogous code number, now used within the census for business enterprises, called the Alpha number. Without such identification, the center cannot meet its prime purpose of integrating the data collected by various agencies into a

single consistent body. Whether these social security or alpha numbers need in turn be keyed to a list of respondents which identifies them by name and address within the data collecting agencies is a technical detail. That it must be done some place is perfectly clear as it now is within the several agencies that collect the information as it now is. On the other hand, it is not, in general, necessary that the files in the data center contain a complete replica of every file on every respondent who has provided information to any of the original collectors. In many cases-for example, the social security files a properly designed sample would serve the same purposes more economically. To this extent then, the data center will not contain a file on every individual, every household, every business, et cetera, but a mixture of a collection of samples-some of them relatively large-and complete files of some groups of reporting units which are particularly interesting and important from an analytical point of view. But here again, the significance of the difference between reproducing for the data center a complete file which already exists in some other agency, and reproducing only a sample therefrom can easily be overemphasized.

The content of information now in the inventory of government agencies is controlled ultimately by the Congress, operating through the appropriations process; and more immediately by the separate bureaucratic hierarchies of each data collecting agency, subject to the overall review of the Director of the Budget. He has a specific statutory responsibility for reviewing all governmental questionnaires directed to the public, with a view to eliminating duplication and keeping the total burden on respondents at a reasonable level. If this process seems to be working ineffectively, in the sense of ignoring persistent complaints, then the Appropriation Subcommittees that deal with the budget requests of each data-collecting agency are readily able to exercise a further control. In practice, the existence of this restraint operates to reinforce powerfully the caution of the collecting agencies in expanding their requests.

A new data center would operate within the same framework of controls. Indeed, the Congress, in authorizing its creation should define the kind of information which it would assemble, and could follow the line of demarcation of large-scale systematic demographic, economic, and social statistics suggested above. The inclusion of dossier information could be specifically prohibited. A clear distinction between "a dossier" and "statistical data file" on an individual can be made in principle; namely, a dossier, the specific identity of the individual is central to its purpose; while for a file of data it is merely a technical convenience for assembling in the same file the connected set of characteristics which are the object of information. The purpose of the one is the assembly of information about specific people; the purpose of the other the assembly of statistical frequency distributions of the many characteristics which groups of individuals (or households, business enterprises or other reporting units) share. In practice, of course, this distinction is not self-applying, and administrators and bureaucrats, checked and overseen by politicians, have to apply it. But so is it ever.

The present law and practice governing the use of census data offer a model which could well be applied to the new data center. The law

provides that information contained in an individual census return may not be disclosed either to the general public or to other agencies of the Government, nor may such information be used for law-enforcement, regulatory, or tax-collection activity in respect to any individual respondent. This statutory restriction has been effectively enforced, and the Census Bureau has maintained for years the confidence of respondents in its will and ability to protect the information they give to it. The same statutory restraint could and should be extended to the data center, and the same results could be expected of it. The data center would supply to all users, inside and outside the Government, frequency distributions, summaries, analyses, but never data on individuals or other single reporting units. The technology of machine storage and processing would make it possible for these outputs to be tailored closely to the needs of individual users without great expense and without disclosure of individual data. This is just what is not possible under our decentralized system.

In my statement I talk about the question of cracking the system, penetrating it, and I think I will skip that, Senator, if I may, and say it is a technical question, and the technicians have told me that it can be handled.

Senator LONG. Doctor, we are somewhat concerned about the technical questions involved in this, and I would like to have your comment on that.

Dr. KAYSEN. I will be glad to comment. Perhaps, instead of reading the paragraph, I will answer a question.

Bearing all this in mind, I conclude that the risky potentials which might be inherent in a data center are so unlikely to materialize if faced beforehand, in the design and administration of the center, that they are outweighed, on balance, by the real improvement in understanding of our economic and social processes this enterprise would make possible, with all the concomitant gains in intelligent and effective public policy that such understanding could lead to. Thank you. (The prepared statement of Dr. Kaysen follows:)

PREPARED STATEMENT OF DR. CARL KAYSEN, BEFORE THE SUBCOMMITTEE OF ADMINISTRATIVE PRACTICE AND PROCEDURE, OF THE U.S. SENATE COMMITTEE ON THE JUDICIARY, MARCH 14, 1967

My name is Carl Kaysen; I live at 97 Olden Lane, Princeton, New Jersey. I am Director of the Institute for Advanced Study in Princeton. By profession I am an economist, and it is in this capacity that I undertook the responsibility of being Chairman of the Task Force on Storage Of and Access To Government Statistics, that reported to the Director of the Budget. At that time I did so last year I was Littauer Professor of Political Economy and Associate Dean of the Graduate School of Public Administration at Harvard University.

The purpose of the Task Force was to examine a problem in government organization and operation which the members of the Committee thought was of importance to the government and to the public, looking at the problem from a perspective which most of us shared as users of government statistics. As economists we are aware that both the intellectual development of economics and its practical success have depended greatly on the large body of quantitative information on the whole range of economic activity that is publicly available in modern, democratic states. Much of this material is the by-product of regulatory, administrative, and revenue-raising activities of government, and its public availability reflects the democratic ethos. In the United States there is a central core of demographic, economic, and social information that is collected, organized and published by the Census Bureau in response to both

« PreviousContinue »