Page images
PDF
EPUB

more important than prospective savings in money are prospective savings in the effort of information collection and the corresponding burdens on individuals, business, and other organizations in filling out forms and responding to questionnaires. As the demand for information grows, the need to minimize the costs in respondents' time and effort becomes more important. The present statistical system is only moderately well-adapted to this objective; a data center would make possible a much better performance on this score.

WHAT IT IS AND ISN'T

So much for the purpose of a data center; how would it function? First, it is important to point out that a data center is not the equivalent of single centralized statistical agency which takes over responsibility for the entire informationgathering, record-keeping, and analytical activity of the Federal government. Rather, it deals with only one of the three basic functions of the statistical system-integration and storage of information in accessible form-and leaves the other two-collection of information, and tabulation, analysis, and publication-in their present decentralized state. To be sure, if the Data Center is as effective and efficient as some of its proponents expect, some redistribution of the last set of tasks between the agencies presently doing them and the Center would probably occur. This, however, would be the result of choice on the part of the using agencies, if they saw an opportunity to do a better and less costly job through the Center than they could do for themselves.

The crucial questions, of course, are (a) what information would be put into the data center, and (b) how would access to it be controlled? In the words of the Task Force Report, the "Center would assemble in a single facility all large-scale systematic bodies of demographic, economic and social data generated by the present data-collection or administrative processes of the Federal Government, *** integrate the date to the maximum feasible extent, and in such a way as to preserve as much as possible of the original information content of the whole body of records, and provide ready access to the information, within the laws governing disclosure, to all users in the Government, and where appropriate to qualified users outside the Government on suitably compensatory terms."

The phrase "large scale systematic bodies of demographic economic and social data" translates, in more concrete terms, into the existing bodies of data collected by Census, the Bureau of Labor Statistics, the Department of Agriculture, the National Center for Health Statistics, the Office of Education, and so on. It also includes the large bodies of data generated as a by-product of the administration of the Federal income tax and the Social Security system. It does not include police dossiers from the FBI, personnel records of the Civil Service Commission or the individual government agencies, or personnel records of the armed services, and other dossier information, none of which fits what is meant by the phrase "large scale, systematic bodies of social, economic, and demographic data."

For the data center to achieve its intended purposes, the material in it must identify individual respondents in some way, by Social Security number for individuals, or an analogous code number now used within the Census for business enterprises called the Alpha number. Without such identification, the Center cannot meet its prime purpose of integrating the data collected by various agencies into a single consistent body. Whether these Social Security or Alpha numbers need in turn to be keyed to a list of respondents which identifies them by name and address within the data center itself, or whether that need be done only within the actual data collecting agencies, is a technical detail. That it must be done someplace is perfectly clear, as it now is done within the several agencies that collect the information.

On the other hand, it is not in general necessary that the central files in the data center contain a complete replica of every file on every respondent who has provided information to the original collectors. In many cases-e.g., the Social Security files-a properly designed sample would serve the same purposes more economically. To this extent, then, the data center will not contain a file on every individual, every household, every business, etc., but a mixture of a collection of samples-some of them relatively large-and complete files of some groups of reporting units which are particularly interesting and important from an analytical point of view. But here again, the significance of the difference between reproducing for the data center a complete file which already exists in some other agency, and reproducing only a sample therefrom, can easily be over-emphasized.

ANXIETIES

It is neither intemperate nor inappropriate to observe that the merits of the proposed data center have hardly been discussed in the tones that ordinarily mark consideration of a small change in government organization in behalf of greater effectiveness and efficiency. The anxieties stimulated by or crystallizing around the proposal can be divided into six groups: (1) the center will contain information that should not be in it; (2) the information can be improperly used by those within the government who have access to it; (3) the "bank" will be subject to cracking, so to speak, and data on individuals will be used to their detriment in any way from blackmail to gossip; (4) an enterprise of this sort is inherently expanding in nature, and no matter how modestly it begins, it will grow to include more and more, and eventually too much; (5) it both represents and encourages meddling and paternalistic government, trying to do too much in controlling the lives of its citizens; and (6) at a deeper level, it stands for a notion of an omniscient government, which is in some fundamental way inconsistent with our individualistic and democratic values. These categories are overlapping in part and hardly all on the same logical level of discourse, but they seem to contain broadly all the criticisms that have been made.

To what extent are these problems real and new; to what extent are they simply translations into a new technical mode of familiar and persistent problems in the relation of citizens and government? And, if the latter, how well can variants of familiar mechanisms be adapted to deal with them? In what follows, I argue that while the fears raised by critics have real content, the problems are neither entirely novel, nor beyond the range of control by adaptations of present governmental mechanisms.

The first two questions go to the fundamental problem of government: quis custodiet ipsos custodes? The content of information now in the inventory of government agencies is controlled ultimately by the Congress, operating through the appropriations process; and more immediately by the separate bureaucratic hierarchies of each data collecting agency, subject to the overall review of the Director of the Budget. He has a specific statutory responsibility for reviewing all governmental questionnaires directed to the public, with a view to eliminating duplication and keeping the total burden on respondents at a reasonable level. If this process seems to be working ineffectively, in the sense of ignoring persistent complaints, then the Appropriations Subcommittees that deal with the budget requests of each data-collecting agency are readily able to exercise a further control. In practice, the existence of this restraint operates to reinforce powerfully the caution of the collecting agencies in expanding their requests.

A new data center would operate within the same framework of controls. Indeed, the Congress, in authorizing its creation, should define the kind of information which it would assemble, and could follow the line of demarcation of largescale systematic demographic, economic, and social statistics suggested above. The inclusion of dossier information could be specifically prohibited. A clear distinction between "a dossier" and "statistical data file" on an individual can be made in principle; namely, for a dossier, the specific identity of the individual is central to its purpose; while for a file of data it is merely a techncal convenence for assembling in the same file the connected set of characteristics which are the object of information. The purpose of the one is the assembly of information about specific people; the purpose of the other, the assembly of statistical frequency distributions of the many characteristics which groups of individuals (or households, business enterprises, or other reporting units) share. In practice, of course, this distinction is not self-applying, and administrators and bureaucrats, checked and overseen by politicians, have to apply it. But so is it ever.

A similar set of observations is relevant to the question of the control over the use of data in the center. The present law and practice governing the Census Bureau offer a model for this purpose. The law provides that information contained in an individual Census return may not be disclosed either to the general public or to other agencies of the government, nor may such information be used for law-enforcement, regulatory, or tax-collection activity in respect to any individual respondent. This statutory restriction has been effectively enforced, and the Census Bureau has maintained for years the confidence of respondents in its will and ability to protect the information they give to it. The same statutory restraint could and should be extended to the data center, and the same results could be expected of it. The data center would supply to all users, inside and outside the government, frequency distributions, summaries, analyses, but never

data on individuals or other single reporting units. The technology of machine storage and processing would make it possible for these outputs to be tailored closely to the needs of individual users without great expense and without disclosure of individual data. This is just what is not possible under our present system.

TEMPTATIONS

However, it may be argued that the greater richness of the data file on any single reporting unit in a new data center as compared with those presently existing in the Census, the Social Security Administration, the Internal Revenue Service, and elsewhere would greatly increase the temptation for those with legitimate access to the data to use it improperly. The same agrument goes to the third point listed above-the "cracking" of the center by outsiders ranging from corrupt politicians to greedy businessmen and organized criminals. It is clearly the case that centralized storage in machine-readable form of large bodies of information makes the rewards of successful abuse or "penetration" relatively large-compared to what they would be in a more decentralized, less mechanized system. It is not at all clear, however, that the cost of successful misapplication or penetration cannot be increased even more sharply than the rewards. In detail this is a technical problem of great complexity, but it seems clear from experience with a variety of secrecy-preserving techniques that a well-designed system of record storage and use could make "penetration" highly costly and to a large extent selfannouncing. It is not difficult, for example, so to organize and code the basic records that programs for retrieving information routinely record the user and the purpose for which it was used. Any continued improper use would thus leave a trail that would invite discovery. Or, to mention another aspect, identifying numbers could be specially coded, and the key to that code made available on a much more restricted basis than were other codes. While no security system can be made perfect, it is feasible to make the costs of breaking it sufficiently high so as to keep the problem within tolerable bounds. The same kinds of safeguards would prevent misuse of the data by those with legitimate access to it.

The last three kinds of objections are similar in that they reflect a certain stance toward the government, and toward the evolution of its role in the larger society, and are not tied to any specific concrete problems. Indeed, the concrete problems underlying these broader concerns are those already examined. How will the contents of the data bank be controlled? Who will determine to what uses it may be put? How can we prevent the stock of information from being abused, misused, or simply misappropriated? But there appears beyond these specifics an attitude hard to discuss because of its intangible nature.

On the broadest level, one can simply reject the notion that there is an ineluctable ever-expanding process of governmental "intrusion" which must be resisted at every turn, yet inevitably overcomes whatever resistance the public offers. After all, this is the stuff of right-wing ideology. Opposed to this is the pragmatic liberal view that the public calls in the government, with more or less deliberation, when there are social problems to be solved which require governmentally-organized efforts and legally-enforceable obligations for their solution. Indeed, many proponents of this view see the restraints on government action built into our political system as too high, rather than too low, and action as typically too little and too late. On this view we have suffered more, at least in matters of domestic policy, from the feebleness of our government than from its overweening strength.

Without decisively choosing one over the other of these ideological stances, and with full recognition that a government too feeble for the welfare of its citizens in some matters may be too strong for their comfort or even their liberty in others, it is possible to believe, as I do, that the present balance of forces in our political machinery tends to the side of healthy restraint in matters such as these. After all, the very course of discussion on these problems, since the Center was first considered, supports this view. Accordingly, I conclude that the risky potentials which might be inherent in a data center are sufficiently unlikely to materialize so that they are outweighed, on balance, by the real improvement in understanding of our economic and social processes this enterprise would make possible, with all the concomitant gains in intelligent and effective public policy that such understanding could lead to.

KF

26

.J833

1968c

v.1

« PreviousContinue »