Page images

Mr. WATERS. And they become, I think, justifiably aggravated because the computer is so impersonal, and they write in and say, “Please give this to a human being.”

It might be amusing to think about it with respect to a charge account. It could, however, have an enormous impact on an individual adversely affected by Government action who may never even know the impersonality of this type of arrangement that he has either been passed over or selected for something as a consequence of a digit being misplaced or a punchcard or, perhaps, a card that has been folded.

Do you envision any particular type of prophylaxis in an endeavor to protect the public against errors which can become very important to the individuals affected ?

Dr. KAYSEN. Well, the main prophylaxis I envision is, as I said in answer to a question of Senator Long's, that the Data Center would not be used for this kind of operation. It would not be a personnel file. It would not be a security file. It would not be a promotion file.

I do not say that income tax returns, social security returns, census returns are free from error. But if the returns are used only for statistical purposes, then the error is much less significant.

Mr. WATERS. The chairman made the point in connection with the number of cattle, and I think your response was that certainly it would not be available to the income tax people.

But suppose the Department of Agriculture were concerned about the amount of cattle on a farm under a particular program number or control number, depending on what it is, and they get this for this purpose. Would they be entitled to do that?

Dr. KAYSEN. No, in this particular case the Department of Agriculture would be the original collecting agency, and it has a right, with which I am not familiar in detail, some statutory right, to collect the information, and probably some set of constraints about what they do with it.

If the Department of Agriculture is now allowed to look at that return and send an agent out to Senator Long's farm and say, "Senator, why did you tell us so and so?” the creation of the Data Center would not change that right. If the Department of Agriculture does not now have that right, the creation of the Data Center would not give it to them, and, as the Data Center would operate, data on Senator Long, with some code number identification, would go into the Data Center from the Department of Agriculture. But the same data would never go out as an individual report, either to the Department of Agriculture or to anybody else. The Department of Agriculture would be able to look up its own files and say, “We got a report from so and so and we want to do such and such, depending on the nature of

Senator Long. It would be there if they could get to it.
Dr. KAYSEN. Sir?

Senator Long. That information would be in the data file if they could get to it.

Dr. KAYSEN. It would be there if they could get to it.
Senator Long. Excuse me.

this report.

Mr. WATERS. That is the point I was trying to make, Dr. Kaysen, that once having achieved it for a legitimate departmental purpose, they would then be free without leaving the tracks that you have described in the computer, to disseminate that information to other agencies that are interested. They are now authorized to do it under existing law, are they not?

Dr. KAYSEN. Yes. If they were authorized to do it, they would be free to do it, although one of the recommendations that I think is important in our report is that in the process of creation of a Data Center, there should be a review and consolidation of disclosure and confidentiality laws on statistical information over the whole Government, so that the Congress says, "Here are the standards not only for the Data Center

but for everybody." Mr. WATERS. Thank you, Dr. Kaysen. Thank you, Mr. Chairman.

Senator Long. Doctor, you were talking about this income tax a while ago. My daughter was married a couple of years ago and filed her income tax, of course, under her maiden name all this time, and the last year she filed it under her married name.

Well, about every month she has been getting a letter from the income tax people saying that she had better pay her tax. She did not find anything wrong and would write and tell them.

It was not until about a month ago when I finally wrote the Director a personal letter, asking them just as a favor to me, to prevent her from harassing me, to look at the computer and find out that she had paid.

We have not heard any more about it. But we do have that type of situation with this computer. There will be a lot of erroneous information that will go out. Any other questions?

I would like to place in the record at this time the Report of the Task Force on the Storage of and Access to Government Statistics, of which the Doctor here was the Chairman. Without objection, it will be placed in the record at this point.

(The document referred to follows:)





Carl Kaysen, Chairman, Institute for Advanced Study; Charles C. Holt, University of Wisconsin; Richard Holton, University of California, Berkeley; George Kozmetsky, University of Texas; H. Russell Morrison, Standard Statistics Co.; Richard Ruggles, Yale University

The Committee was originally charged with the task of considering "measures which should be taken to improve the storage of and access to U.S. Government Statistics." It is the best judgment of the Committee that it can answer this question only in a much broader context, namely, by looking at the question of how the Federal Statistical System can be organized and operated so as:

1. To be capable of development to meet the accelerating needs for statistical information, needs that are increasing in quantity, in variety, and in degree of detail with the developing character of American society, and the changing responsibilities in it of the Federal Government;

2. To develop safeguards which will preserve the right of the individual to privacy in relation to information he discloses to the government either voluntarily or under legal compulsion ;

3. To make the best use of existing information and information generating methods and institutions at its disposal ; and

4. Tb meet these needs for statistical information with a minimum burden of reporting on individuals, businesses, and other reporting units. The focus of the committee's concern is the Federal statistical system. Although different government agencies may require information about specific individuals or businesses as part of their legal operating responsibilities, the committee was unanimous in its belief that Federal agencies or other users should not be able to draw on data which is available within the Federal statistical system in any way that would violate the right of the individual to privacy. Organizational and legal safeguards should be developed to prevent the use of data which is brought together for statistical purposes as a source of information concerning individual reporting units.

A body of data can provide useful statistical information only to the extent that it is live, in the sense of corresponding to a clearly defined and currently comprehensible system of identifying the sources of information, definitions of quantities being measured, classifications on which groupings of units are based, and the relations of all these categories to those for other information collected on similar units, or the same units at different times. Thus, no discussion of storage of and access to data can be usefully conducted without some consideration of the larger information system from basic data collection to analysisof which storage and access are a part.

[blocks in formation]

At present, the Federal Statistical System is decentralized in respect to all its basic functions: collection, storage, analysis, tabulation, and publication. Twenty-one bureaus are shown in the Budget Bureau list of the principal statistical programs” for FY 1967. Their total estimated budget, including the annual average over recent years of expenditures on periodic programs (mostly Census programs), was about $122 million, of which $96 million was for current programs, and the balance for periodic programs. The four largest agencies, with their shares of the total budget, were: Census, 24 per cent; Bureau of Labor Statistics, 16 per cent; Statistical Reporting Service, Department of Agriculture, 10 per cent; and Economic Research Service, Department of Agriculture, 10 per cent. Their total share was thus some 60 per cent, and the next four agencies National Center for Health Statistics, Social Security Administration, Internal Revenue Service and National Science Foundation, accounted for an additional 18 per cent, making a total share for the largest eight of 78 per cent. Decentralization has been increasing. A decade ago, the four largest statistical programs—those of Census, Agriculture (with the Statistical Reporting Service and the Economic Research Service operating as a single unified agency), Bureau of Labor Statistics, and Social Security Administrationaccounted for 71 per cent of the total expenditures of the 11 Bureaus which had significant programs.

The increase in dispersion has occurred in a period of increasingly rapid growth in the total size of the System's activities. The total budget for 1956 for the 11 major agencies was $47 million, of which some $37 million was for current as opposed to periodic programs. In the period 1950–56, the (arithmetic) average annual rate of growth of expenditures for current programs was about 2.5 per cent; in the period 1957–60, nearly 7 per cent; for 1961-66, it has passed 15 per cent. Periodic programs are also increasing in scope and cost, and a projection of the order of $200 million for the 1970 level of expenditures for principal programs appears reasonable. Since many of the most rapidly growing programs have been those of new agencies, or agencies mounting major statistical programs for the first time, the process of further decentralization promises to continue, unless action is taken to change the trend. We do not mean to suggest that the opposite extreme of complete centralization of all data-gathering and analysis is desirable. As we explain below, even ignoring the difficulties of scrapping an existing structure and starting entirely afresh, a substantial amount of decentralization is inevitable and desirable, particularly in connection with the administrative, program planning, and program analysis function of the operating agencies.

[ocr errors][merged small]

The high degree of decentralization in all functions of the present statistical system has for some time been recognized as a major obstacle in the way of its effective functioning.

Nearly two decades ago, F. C. Mins and C. D. Long of the National Bureau of Economic Research made a study of The Statistical Agencies of the Federal Government (National Bureau of Economic Research, New York, 1949) for the Hoover Commission. They pointed to many problems arising from excessive decentralization and inadequate coordination. The major remedies they proposed included greater centralization-in the Census Bureau, and the creation of an Office of Statistical Standards with great powers to coordinate and unify that which was not centralized. These recommendations were followed to some extent, but the growth of the problem has out-stripped the strength of the remedies applied.

In March 1965, a committee of the Social Science Research Council, in a Report on the Preservation and Use of Economic Data, recommended the creation of a National Data Center, in order to remedy some of the most pressing problems arising out of the present statistical system. In a review of that report made for the Office of Statistical Standards (Bureau of the Budget) and completed in November 1965, Dr. Edgar S. Dunn, Jr.,, of Resources for the Future, endorsed the substance of these recommendations, and in some respects went beyond them, Dr. Dunn was assisted in this report by a group of experienced professionals drawn from various parts of the Federal Statistical System, as well as by experts in automatic data processing of the National Bureau of Standards.

As it is presently operated, the statistical system is both inadequate in the sense of failing to do things that should and could be done, and inefficientin the sense of not doing what it does at minimum cost, or getting less for what it spends than might be possible.

The inadequacy of the present statistical system has three major aspects. The first is the lag between the receipt of information and its availability in' usable form. This is most, striking in the case of the Statistics of Income for Corporation Income Tax Returns. There is a one-and-a-half year lag between filing of returns and preliminary summary publication, and a two-and-a-half year lag before final detailed publication. A large part of the problem arises from the variation in filing dates of corporations filing on a fiscal year basis: some may file as much as 10 months after the end of the calendar year under which their returns are compiled. But part of the problem does reflect questions of priority and availability of facilities, and though these reports provide a basic source of economic data of great importance, their reporting function cannot be given first place in the administartion of the Internal Revenue Service.

A second and deeper source of inadequacy in the present system is its widespread suppression of micro-information, and its orientation toward publication of necessarily aggregated and tabulated information as its major goal. These are of course intimately related : restrictions on disclosure to the general public or unauthorized persons within the government of information on individual reporting units is a necessary and desirable legal constraint on any official agency collecting information under the sanction of law. So long as publication is thought of as the basic process that makes information available for use, aggregation and the suppression and ultimate permanent loss of micro-information cannot be avoided. The consequence, however, is the necessity of substituting worse for better information, and cruder for more refined analyses, by those who use the data for research and policy purposes. In particular, much ingenuity and effort is spent in the construction of rough estimates of magnitudes and relations that could be measured with much greater accuracy, the mircoinformation that present statistical records originally contained was preserved in usable and accessible form. Present technology makes it possible to do this economically and consistently with desirable limits on disclosure.

The growing decentralization of statistical programs has led to another major inadequacy. At the present time different agencies yiew the probelm of the right to privacy very differently. In some agencies the policy of protecting the privacy of the information reported by individuals and businesses is formally stated and protected by law; in such instances the enforcement of such policies has also been found to be very good. In other instances, formal policies regarding dis


closure have not been set up, and in many of these cases the protection depends on the judgment of those who are in charge of the different programs involved. Understandably, the growing decentralization of statistical programs has thus led to considerable unevenness in the nature and enforcement of disclosure rules. It is quite possible that without some overall policy which can be responsibly supervised major violations of individual privacy may take place. It should be the function of some group within the Federal Statistical System to ensure that data gathered for statistical purposes or obtained as a by-product of the administrative process is not to be used against an individual or enterprise. Thus at the present time information about individual persons or businesses collected by the Census Bureau cannot be used by the Interal Revenue Service

Department of Justice against individuals or enterpirses in the investiga

or type of protection must be preserved in order both to protect the rights of individuals involved and to avoid falsification of information which might develop if individuals were not given assurance against disclosure.

The major elements of inefficiency to which decentralization has led are of three kinds. The first is duplication in the collection of information. Although the Office of Statistical Standards controls duplication, it is not always successful in eliminating it entirely. Avoiding duplication is especially important in that it needlessly spends not only money but the ever scarcer resource of cooperation by the public; households, business firms, and other respondents, in answering enquiries. While duplication within single agencies is not serious, the great degree of decentralization leads to overlaps between programs of different agencies. The problem is less the collection of exactly the same information by two agencies, and more the collection in two surveys or reports of data that could be collected in one. Failure to make the maximum use of each occasion for collecting information may well lead to a burden on respondents which becomes intolerable with growing needs for data. An example of the problem is provided by current practice in connection with sample data on retailing. The Bureau of the Census collects data on retail sales from one sample of retail stores and the Bureau of Labor Statistics collects data on employment, wages, and hours from another. As a result, there are doubts about the comparability of these input and output data at various levels of publication detail. These doubts arise not so much from the differences in the two samples as from differences in the two Bureaus' methods of assigning industry codes and definitions of reporting units. In both input and output data were collected on the same report form and processed by the same agency, these differences in comparability would be eliminated. This situation applies not only to retail sales but also to manufacturing data, where the Bureau of the Census collects monthly figures on sales, orders, and inventories, while the Bureau of Labor Statistics surveys manufacturing employment, man-hours, and wages each month. There is little doubt that a single consolidated reporting system, using one sample, would be both less burdensome, and less costly, and yield better information.

The second source of inefficiency is failure to use as a statistical resource all the information potentially available in the data collected. This, in turn, has a number of sources. (1) Collection of the data on the same reporting units by different collecting agencies operating with different classification systems, unit definitions, and the like, results in inability to match all the relevant available information on a responding unit for analytical purposes. Information on groups of respondents of different, and to some extent imperfectly known, composition cannot properly be compared and correlated. Census, IRS, SEC, and FTC data on business enterprises exemplify this problem. These incompatibilities in definition often reflect the different purposes of the several agencies that collect the data ; yet effort directed to resolving these problems can be fruitful and is worthwhile. (2) After separate collecting and processing, agencies assemble data in summary form; the original individual reports are all but unavailable for further use, or available only at prohibitive costs. This effectively prevents different summaries and analyses of the data for other purposes by the same agency or by different agencies. In particular, the efficient use of data for intertemporal comparisons over any but a short time period becomes difficult, as the classifications change over time, and thus much information is irretrievably lost. (3) Confidentiality restrictions as interpreted by different agencies often act as a bar. rier to the full use of data for statistical purposes inside the government and within the legal boundaries of use.

« PreviousContinue »