Page images
PDF
EPUB

readable form of large bodies of information makes the rewards of successful abuse or "penetration" relatively large compared to what they would be in a more decentralized, less mechanized system. It is not at all clear, however, that the cost of successful misapplication or penetration cannot be increased even more sharply than the rewards. In detail this is a technical problem of great complexity, but it seems clear from experience with a variety of secrecy-preserving techniques that a well-designed system of record storage and use could make "penetration" highly costly and to a large extent self-announcing. It is not difficult, for example, so to organize and code the basic records that programs for retrieving information routinely record the user and the purpose for which it was used. Any continued improper use would thus leave a trail that would invite discovery. Or, to mention another aspect, identifying numbers could be specially coded, and the key to that code made available on a much more restricted basis than were other codes. While no security system can be made perfect, it is feasible to make the costs of breaking it sufficiently high to keep the problem within tolerable bounds. The same kinds of safeguards would guard against misuse of the data by those with legitimate access to it...

Bearing all this in mind, I conclude that the risky potentials which might be inherent in a data center are so unlikely to materialize if faced beforehand, in the design and administration of the center, that they are outweighed, on balance, by the real improvement in understanding of our economic and social processes this enterprise would make possible, with all the concomitant gains in intelligent and effective public policy that such understanding could lead to.

Senator LONG. Doctor, so far as your statement is concerned, of course, one of the things I am apprehensive about is that with all the information stored on you, say, from the cradle to the grave, that some overzealous agent could not push a button on a computer and everything in your life would be laid out bare on the table. Is that possible? Dr. KAYSEN. There is no doubt that it is technically possible, and will be technically possible for somebody with legitimate access; that is, an employee of the Data Center, to do something like what you have described, Senator.

I think it is a little too simple minded to suggest that he can push a button. This process is a fairly complicated one, and he has to do more than push a button.

But, safeguards against illegitimate use of the data can be built into the center, and one safeguard that would be very important and be responsive to the problem that you raise can be described in roughly this way: every time anyone calls out a file or a group of files out of the Data Center, he has to make a record entry that says, "I am so and so, I' have called out files number so and so, and so and so, and I have called them out pursuant to such and such a job order." Thus he leaves a trail, and the machine, and the programs which operate it, can be so organized that nobody can operate the machine without leaving the trail, unless he tries to eradicate both the trail and the data in the machine and thus shows to his supervisors that something has gone wrong.

Senator LONG. Wouldn't it all go back to whether or not an agent or the supervisor were overzealous? Personal elements are bound to be involved. It cannot be so mechanical or so computerized that there would not be a human element involved of someone getting that information.

Dr. KAYSEN. No question about that, sir.

Let me make the analogy of the situation now and the situation under a Data Center. Right now an overzealous or unscrupulous or dishonest employee of the Internal Revenue Service, to take an example, might

take out an individual income tax file. He might, with good intention and high motives, but illegally, hand it to the counsel of some congressional committee because he thinks that the committee ought to be aware of it. There is, of course, a legal and formal procedure for a committee to call up such a return, but not by having an employee voluntarily come to the counsel.

Senator LONG. You should try, Mr. Witness, to get information from the Internal Revenue Service. [Laughter.]

Apparently other people have more success at it than we do at times. Dr. KAYSEN. Now, the point I would make is that if an employee were to call forth a tax return from the data file, from the machinestored file, he would have to leave a record that he had done so. He may now be able to open a file drawer, take it out, put it in a Xerox machine, put it back in the file drawer, and not leave a record.

Senator LONG. How would he leave that record?

Dr. KAYSEN. The program would be such that he simply could not get the machine to tell him anything unless his request included a statement that said, "My code number is so and so, the job number is so and so, and this is why I want it," in effect.

Senator LONG. You mean he could not, some overzealous person could not, get your job number and put it on there?

Dr. KAYSEN. Well, I am not trying to say that the system is totally safe against cracking. I do not think you can ever produce a system that is totally proof against cracking.

Senator LONG. Then there would be the possibility that they could get this information out, and there would be much more information than just an income tax return in that report.

Dr. KAYSEN. There would be much more information. But what I am suggesting is that the effort required, the skill required, and the number of people who would have to be suborned, corrupted, or included might be much greater with a machine-storage system than it is now when an employee with legitimate access can open a file drawer and copy something out of a file.

Senator LONG. But the information you would get would be all gathered together, and it would be much greater than you would get by going to a hundred different agencies now.

Dr. KAYSEN. That is certainly true.

Senator LONG. I think the problem can be put very sharply as follows: A data center would make the returns from abuse or corruption, from a single act of abuse or corruption, greater, because a single act would produce more information, and that is the point that you have been making. However, the center could be so designed as to make the cost of that single act of corruption or abuse, the cost in terms of effort, skill, number of people involved, even greater, and the question we have to ask is will the balance of cost and return be altered in favor of greater temptation or will it be altered in favor of less temptation.

I want to make clear that I myself am not an expert on the design of computers and computer programs. I am told by people who are experts that it is not a difficult task to make a system secure against misuse and penetration. It may be one that takes labor and thought and effort, but it is not a difficult task to make the cost very high, to

make the self-checking features go very deep into the whole enterprise so that it is hard to do what we have just been talking about without leaving a trail. But I certainly do not want to say it will be impossible to do it because I think that would be an inaccurate statement.

Senator LONG. Doctor, I do not want to be in a position of saying at this time that I am against the Data Center, but I am concerned about it, and I think it is something that we should have a great deal of discussion on. We should see and consider all facets of the problem. I am concerned now somewhat with what goes into this Data Center, and who is going to decide what goes in it; and who plans to tell Congress what needs to go into it if Congress is to make the decision. I want to ask you, too, where the burden should be of anyone who would propose that certain information goes in, and I judge the burden should be on that proposer as to what goes in it. Could you comment on that general field?

Dr. KAYSEN. Right.

As those of us who worked on the matter envision the proposal and, perhaps, I ought to utter another caution here-Mr. Zwick from the Bureau of the Budget can speak for the Government, which I cannot do—I am speaking simply for this Committee and myself and its members.

Senator LONG. He just speaks for the Bureau of the Budget.

Dr. KAYSEN. Well, the executive part of the Government. I am just speaking for the Committee.

Senator LONG. Maybe the man out on 1600 Pennsylvania Avenue would object to that.

Dr. KAYSEN. I am glad to leave that question to you and the Committee. But just speaking for myself and my colleagues of the Committee, what we thought of could be put something like this: in the first instance, we would not think of putting anything in the Data Center that is not now already in the Government's statistical system. Senator LONG. Officially or

Dr. KAYSEN. Officially.

Senator LONG. But don't many of these agents collect a great deal of material unofficially?

Dr. KAYSEN. If they do, we are not thinking of that. In this phrase "large-scale social, economic, and demographic bodies of systematic data," what we are thinking

Senator LONG. I am thinking about the Internal Revenue Service. I understand in each district they make great collections out of the newspaper files and reports, and so on, about individuals that they maintain in a file.

Dr. KAYSEN. Let me try to say what I think would go in from the Internal Revenue, and this is meant to be illustrative and not definitive. First, let us take an individual taxpayer. I think that we would be dealing with a sample rather than all the individual taxpayers. I am not now prepared to say how big a sample it should be 1 percent, 5 percent, 10 percent, but we are talking about numbers like that. We would probably want to put in some identifying number, the social security number or whatever it is, so we know what file we are dealing with. Then we would want to put in the major items of the tax return, perhaps total income from earnings, number of differ

ent employers, total income from stocks and bonds, that kind of thing. Perhaps half the information in the income tax return would be used for statistical purposes, so that we could better answer questions such as: if a man increases his income by so and so how much can we expect he increases his consumption, or if we increase his taxes how much can we expect he will decrease his consumption? These are the kind of questions that economists inside and outside the Government are always trying to answer. It is answers to questions of this kind that Congress has to evaluate when it has a tax policy proposal before it.

We are proposing that the Data Center be able to utilize the information now in the tax returns to give better answers to these kinds of questions than are possible now. Why will it be a better answer? It will be a better answer because we can put tax-return information together with other kinds of information from census returns and social security files, and just have more knowledge on which to base our analysis of behavior than if we use each body of data separately. Senator LONG. Why would you need the man's name?

Dr. KAYSEN, You would not need his name, Senator Long.

In principle, all we would need is some peg on which to hang the fact that the social security file which, let us say, shows how many days of the year he worked, and the income tax file, that shows both his labor and his nonlabor income, if any, refer to the same person. We want to be able to match them up, so we are making a conclusion that is based on the behavior of the same person. We would not want a result which matched up, let us say, my dividend income with your labor, income. That would be bad for you, although it would be good for me. But if we had that, and we matched up Mr. A. and Mr. B., the numbers that were mismatched would not make any sense.

What we need is something, whether it is a name, or a social security number, or some other code number does not matter, but some peg on which to hang the data and be sure that we have a consistent body of data.

It would be perfectly possible, for example, to have the following kind of safeguards: let us say that every data collecting agency, the IRS, the Social Security, the Census, and so on

Senator LONG. Let us not leave out the Department of Agriculture. I will ask you about them later.

Dr. KAYSEN. Let us not leave out any of them-give a matching number to the central data file. One number would be an arbitrary code number, one number would be a social security number, and then, let us say, that the data file had its data classified in terms of the arbitrary code number, and the register, the key, that matched the arbitrary code number with the social security number, was kept separate from all the rest of the data so that the routine operations of fetching data out of storage and putting it back in storage would be based only on this arbitrary code number, and the file which made it possible to go from the arbitrary code number to social security was kept separate and was accessible to far fewer people under much stricter security control. The file which identified social security numbers with names and addresses could be still another file.

Now, in order for the thing to function, we cannot make this segregation total. We have to know that we can put the right information in the right box. But we can certainly have a much higher degree of control on those files which identify names, addresses, individuals, and separate them from the routine operating files.

Senator LONG. Now, out on my farm in Missouri I raise hogs and cattle and sheep, as well as other agricultural products. But I just recall particularly the requests for information that I get, I believe, from the Department of Agriculture with a regularity that is rather annoying. They want to know how many cows I have and how old they are and how many calves they have, and how many weigh over 500 pounds and how many I have on feed, and how many I have on pasture and many more asinine questions of that kind, and when I expect to sell them, and so on.

The same thing about the sheep and the same thing about the hogs. Now, whether I keep a heifer calf in my herd or sell it, it strikes me at that time, as my business and not the Government's.

But I decided I would not answer that question here a while back, and then in about 2 weeks I got another one. I had not answered it. I decided I would not answer that one, and then a little later I got a rather firm letter from them saying that I was violating the Federal law, and that they would be after me if I did not answer it.

I hit on the idea that if I do answer it, then in a few weeks I get another letter asking me all about them...

I hit on the solution that if I just do not put anything, I just put zer-o, I do not hear anything after that.

But actually it strikes me that that is rather an invasion of privacy of any farmer, and this is a constant complaint of the farmers...

Is that type of information to be put in here and, if so, what value will it be to the public? Let me ask you another question. Don't you think that is an invasion of the average farmer's privacy?

Dr. KAYSEN. I think it is in some sense an invasion of his privacy. But I think there is a public interest which counterbalances that invasion.

Senator LONG. There is always, I have noticed, a public interest in the other fellow's business. I do not know whether that is an in-, terest or a curiosity. I do not know.

Dr. KAYSEN. I think, sir, that the interest in this case is for the Department of Agriculture to be able to make an estimate of what milk production will be or how many cattle will be coming on the market, and this estimate of what future meat or milk production will be is an item on which public policy decisions are made. These estimates are presented to the Congress, and you are often asked to, and often initiate actions making decisions about supports or decisions about imports or decisions about many things on the basis of forecasts of what, let us say, meat production will be.

It is in the public interest, it seems to me, to have these forecasts as well based as forecasts can be.

Now, as I would see the balance, there is a public interest in getting an accurate forecast. There is no public interest in disclosing to any individual-except what is needed for the technical purpose of putting

77-577-67- -2

« PreviousContinue »