Computer Privacy

An IBM plant in Endicott, N.Y., that is run almost entirely by a computer demonstrates how automation can cut costs and raise productivity. The plant makes electronic circuit cards that perform the logical and arithmetical operations in computers-so, in effect, the computer helps make other computers. The IBM plant produces many types and sizes of circuit cards. The computer sees that conveyors get each card to the right machine at the right time and that the machine performs the right operation on it. It controls the drilling machines that make holes in the cards, the testing machines that insure the holes are in the proper place and the insertion machines that place small components in the holes.

IBM says the automated assembly lines reduce scrappage and improve quality. The company estimates that computer control of the Endicott plant permits production of circuit cards at half what they would cost with conventional hand operations. Moreover, says IBM, the automated system enables it to respond to market demand for different types of circuits two to three times faster than would be possible otherwise; with computer control there is no need to shut down production and shift personnel about when changing the product mix.

IMPACT ON EMPLOYMENT

The automated circuit card plant employs only a fraction of the production workers that would be needed without automation. The sight of production lines without people is another of the considerations that sometimes give rise to ambivalent feelings about computers. Some observers, particularly in union circles, fear widespread unemployment will result inevitably from increased use of computers in industry.

But others say there can be no clear-cut answer at this point. Logically, it would seem that if industry comes to rely almost totally on computers to guide production operations, there simply would not be enough jobs to go around, unless the work week were drastically reduced.

Up till now, however, workers displaced by automation have generally been absorbed by the expanding economy, and some economists think this will continue to be the case for the foreseeable future. Computer makers themselves note that their industry has created some 250,000 new jobs and that the total will grow.

Even in some fields where controversy over the introduction of computers would seem unlikely, there are those who doubt the amazing machines will be an unmixed blessing. Some doctors and hospital officials, for example, indicate they might not be willing to hand over patient records to computers to which others would have access; making such information freely available, they fear, might lead to a rise in malpractice suits.

Some observers maintain that computer networks set up by banks or by timesharing data-processing centers also have their alarming aspects. What would happen, they ask, if a computer linking thousands of users were programed incorrectly? Most likely, a monumental snarl would ensue. Bills would be deducted from the wrong bank accounts. The boss's paycheck would be credited to the office boy. The solution to a stress problem posed by an engineer would clack out on the doctor's teleprinter.

THE EDITOR,

SAN FRANCISCO STATE COLLEGE,
SCHOOL OF BEHAVIORAL AND SOCIAL SCIENCES,

THE AMERICAN SOCIOLOGIST.

DEPARTMENT OF SOCIOLOGY,
San Francisco, Calif., August 7, 1966.

TO THE EDITOR: It has come to my attention that the U.S. Bureau of the Budget proposes the establishment of a huge centralized computer into which all the data on any American now collected by some 20 separate federal agencies would be fed. It could mean an instant check on any man's birth, school grades, military or criminal record, employment, income, credit rating and even personality traits." (San Francisco Examiner, July 31, 1966) I have no way to know if this report is entirely accurate but anyone who has ever held a "secret" or higher security clearance has some idea of the range of data collected by only

one agency. The data which could be centralized by combining the records from a number of agencies is truly fantastic.

I have talked with a number of people who favor this proposal. They argue that it would eliminate duplication, that it would make police and security work more effective, that it would make running away from obligations such as alimony payments more difficult. These arguments are quite true as anyone who has witnessed the speed and accuracy of California's stolen automobile computer system, or the ten-second personal warrant check available in some areas, can testify.

The first step in the process of setting up a National Identity File has already taken place with the establishment of the National Computer Center in Martinsburg, West Virginia. There taxpayers records, centralized on the basis of their Social Security numbers (which will probably be our new National Identity Numbers) are to be compared with the informational filings of their employers and their banks.

Some sociologists are probably intrigued by the possibility of new kinds of demographic studies, mobility studies, by the possibility of really scientific sampling, or by the sheer amount of raw knowledge obtainable from such a file. If it were available it would be a powerful research tool. I think it is a mistake to be swayed by such considerations.

To argue in favor of a National Identity and Data File requires the assumption that all future governments of this country in all political situations (including war hysteria and witch hunts), all federal agencies both public and secret, and all individuals who could gain access under the cloak of authority or by ruse, will be benevolently motivated. I do not think this assumption can be made by a reasonable man. The potential for evil, for official and unofficial blackmail, for the harassment of political minorities is virtually unlimited. One must realize that whatever safeguards may be proposed in the initial justification could later be removed by a powerful president or a stampeded congress. Also the safeguards probably would be circumvented on or off the record by our undercover agencies.

I see no reason to assume that the government will be any more resistant to the pressures of the moment in the future than it has been in the past. Sending Japenese-American Citizens to concentration camps would have been immensely speeded by having a National Identity and Data File, and McCarthy could have destroyed many more careers if he had computer records of security investigations. Protestors of current Viet Nam policy could easily be marked "politically unreliable" for shipment off to the Tulelake Relocation Center after we bomb China.

On a sociological level an ex-convict would carry the stigma throughout life. He could have a hard time starting anew if, when he is stopped for a traffic offense, the police learn that he is an ex-convict, possibly tell his employer, and from that time on consider him a "suspect" in every crime committed. It happens. An ex-mental patient, who, as Szaz argues, may have been hospitalized for a bad reason in the first place, may find this status coming back to haunt his career and the creditability of his assertions years later. A Bad Conduct Discharge, a record of a homosexual contact, of un-wed motherhood, and affair recorded in a security check, all would be available essentially forever to ruin lives, deny jobs, and make the individual an object of pernicious official attention. It is important to realize that there is no system of safeguards which will assure that the possibilities I have listed will not happen, and there is no safeguard which cannot be removed. I think that the American Sociological Association ought to discuss the issue of a National Identity and Data File at the earliest possible time and take an official stand opposing its establishment.

[From The Public Interest, Spring 1967]

DATA BANKS AND DOSSIERS

(By Carl Kaysen)

Last year, a government committee headed by Carl Kaysen proposed the creation of a "national data center." The intention is to improve the usefulness of available statistics for policy planning purposes by funneling such statistics into a central “information bank." But the proposal evoked considerable criticism as representing a possible threat to privacy and an undue concentration of power, in the form of knowledge, in governmental agencies. In this article, Mr Kaysen, who is Director of the Institute for Advanced Study, in Princeton, presents the case for a “data bank." We expect to continue the discussion of this matter in future issues.-Ed.

Both the intellectual development of economics and its practical success have depended greatly on the large body of statistical information, covering the whole range of economic activity, that is publicly available in modern, democratic states. Much of this material is the by-product of regulatory, administrative, and revenue-raising activities of government, and its public availability reflects the democratic ethos. In the United States there is also a central core of demographic, economic, and social information that is collecter, organized, and published by the Census Bureau in response to both governmental and public demands for information, rather than simply as the reflex of other governmental activities. Over time, and especially in the last three or four decades, there has been a continuing improvement in the coverage, consistency, and quality of these data. Such improvements have in great part resulted from the continuing efforts of social scientists and satisticians both within and outside the government. Without these improvements in the stock of basic quantitative information, our recent success in the application of sophisticated economic analyses to problems of public policy would have been impossible. Thus, the formation last year of a consulting committee composed largely of economists to report to the Director of the Budget-himself an economist of distinction-on "Storage of and Access to Federal Statistical Data" was simply another natural step in a continuing process. The participants were moved by professional concern for the quality and usability of the enormous body of government data to take on what they thought to be a necessary, important, and totally unglamorous task. They certainly did not expect it to be controversial.

The central problem to which the group addressed itself was the consequences of the trend toward increasing decentralization in the Federal statistical system at a time when the demand for more and more detailed quantitative information was growing rapidly. Currently, twenty-one agencies of government have significant statistical programs. The largest four of these the Census, the Bureau of Labor Statistics, the Statistical Reporting Service, and the Economic Research Service of the Department of Agriculture-account for about 60 percent of a total Federal statistical budget of nearly $125 millions. A decade ago, the largest four agencies accounted for 71 percent of a much smaller budget. By 1970, the total statistical budget of the Federal Government will probably exceed $200 millions and, in the absence of deliberate countervailing effort, decentralization will have further increased. Yet, it has already been clear for some time that the Federal statistical system was too decentralized to function effectively and efficiently.

THE DRAMA BEGINS

Such is the background of the report which recommended the creation of a National Data Center. Here, Congressman Cornelius Gallagher (D., 13th District, N.J.) entered the scene, with a different set of concerns and objectives. He was Chairman of a Special Subcommittee on Invasion of Privacy, of the Government Operations Committee of the House, which held hearings on the proposed data center and related topics in the summer of 1966. To some extent the hear

}

The full title of the Report, dated October, 1966. is: Report of the Task Force on the Storage of and Access to Government Statistics, and it is available from the Bureau of the Budget. The Committee which produced it were: Carl Kaysen, Chairman, Institute for Advanced Study; Charles C. Holt, University of Wisconsin; Richard Holton. University of California, Berkeley; George Kozmetsky. University of Texas; H. Russell Morrison, Standard Statistics Co.; Richard Ruggles, Yale University.

ings themselves, and to a much greater extent their refraction in the press, pictured the proposed Data Center as at least a grave threat to personal privacy and at worst a precursor to a computer-managed totalitarian state. Congressman Gallagher himself saw the proposal as one more dreary instance of a group of technocrats ignoring human values in their pursuit of efficiency.

It now appears as if the public outcry which the Committee hearings stimulated and amplified has raised great difficulties in the way of the proposed National Data Center. To what extent are they genuine? To what extent are they unavoidable? Are they of such a magnitude as to outweigh the probable benefits of the Center?

In answering these questions, it appears simplest to begin with a further examination of the proposal itself. The inadequacies arising from our overdecentralized statistical system were recognized two decades ago; since then they have increased. The present system corresponds to an obsolete technology, under which publication was the only practical means of making information available for use. Publication, in turn, involved summarization, and what was published was almost always a summary of the more basic information available to the fact-gathering agency. In part, this reflected necessary and appropriate legal and customary restrictions on the Federal Government's publication of data on individuals or on single business enterprises. In part, it reflected the more fundamental fact that it was difficult or impossible to make use of a vast body of information unless it was presented in some summary form.

Any summarization or tabulation, however, loses some of the detail of the underlying data, and once a summary is published, retabulation of the original data becomes difficult and expensive. Because of the high degree of decentralization of the statistical system, it is frequently the case that information on related aspects of the same unit is collected by different agencies, tabulated and summarized on bases that are different and inconsistent, with a resultant loss of information originally available, and a serious degradation of the quality of analyses using the information. The split, on the one hand, between information on balance sheets and income statements, as collected by the Internal Revenue Service, and, on the other hand, the information on value of economic inputs and outputs as collected by the Census, is one example of this situation.

The result of all this is the substitution of worse for better information, less for more refined analysis, and the expenditure of much ingenuity and labor on the construction of rough estimates of magnitudes that could be precisely determined if all the information underlying summary tabulations were available for use. This, in turn, limits the precision of both the policy process, and our ability to understand, criticize and modify it.

These effects of the inability of the present system to use fully the micro-information fed into it are growing more and more important. The differentiation of the Federal policy process is increasing, and almost certainly will continue to do so. Simple policy measures whose effectiveness could be judged in terms of some overall aggregate or average response for the nation are increasingly giving way to more subtle ones, in which the effects on particular geographic areas, income groups, or social groups become of major interest. The present decentralized system is simply incapable of meeting these needs.

It is becoming increasingly difficult to make informed and intelligent policy decisions on such questions in the area of poverty as welfare payments, family allowances, and the like, simply because we lack sufficient "dis-aggregated" information-breakdowns by the many relevant social and economic variablesthat is both wide in coverage and readily usable. The information the Government does have is scattered among a dozen agencies, collected on a variety of not necessarily consistent bases, and not really accessible to any single group of policy-makers or research analysts. A test of the proposition, for example, that poor performance in school and poor prospects of social mobility are directly related to family size would require data combining information on at least family size and composition, family income, regional location, city size, school performance, and postschool occupational history over a period of years in a way that is simply not now possible, even though the separate items of information were all fed into some part of the Federal statistical system at some time.

A secondary, but not unimportant gain from the creation of the data center, is in simple efficiency. At present, some of the individual data-collecting agencies operate at too small a scale to make full use of the resources of modern information-handling techniques. The use of a central storage and processing agencywhile maintaining decentralized collection, analysis, and publication to whatever extent was desirable-would permit significant economies. As the Federal statistical budget climbs toward $200 million annually, this is not a trivial point. Even

more important than prospective savings in money are prospective savings in the effort of information collection and the corresponding burdens on individuals, business, and other organizations in filling out forms and responding to questionnaires. As the demand for information grows, the need to minimize the costs in respondents' time and effort becomes more important. The present statistical system is only moderately well-adapted to this objective; a data center would make possible a much better performance on this score.

WHAT IT IS AND ISN'T

So much for the purpose of a data center; how would it function? First, it is important to point out that a data center is not the equivalent of single centralized statistical agency which takes over responsibility for the entire informationgathering, record-keeping, and analytical activity of the Federal government. Rather, it deals with only one of the three basic functions of the statistical system-integration and storage of information in accessible form-and leaves the other two-collection of information, and tabulation, analysis, and publication-in their present decentralized state. To be sure, if the Data Center is as effective and efficient as some of its proponents expect, some redistribution of the last set of tasks between the agencies presently doing them and the Center would probably occur. This, however, would be the result of choice on the part of the using agencies, if they saw an opportunity to do a better and less costly job through the Center than they could do for themselves.

The crucial questions, of course, are (a) what information would be put into the data center, and (b) how would access to it be controlled? In the words of the Task Force Report, the "Center would assemble in a single facility all large-scale systematic bodies of demographic, economic and social data generated by the present data-collection or administrative processes of the Federal Government, *** integrate the date to the maximum feasible extent, and in such a way as to preserve as much as possible of the original information content of the whole body of records, and provide ready access to the information, within the laws governing disclosure, to all users in the Government, and where appropriate to qualified users outside the Government on suitably compensatory terms."

The phrase "large scale systematic bodies of demographic economic and social data" translates, in more concrete terms, into the existing bodies of data collected by Census, the Bureau of Labor Statistics, the Department of Agriculture, the National Center for Health Statistics, the Office of Education, and so on. It also includes the large bodies of data generated as a by-product of the administration of the Federal income tax and the Social Security system. It does not include police dossiers from the FBI, personnel records of the Civil Service Commission or the individual government agencies, or personnel records of the armed services, and other dossier information, none of which fits what is meant by the phrase "large scale, systematic bodies of social, economic, and demographic data."

For the data center to achieve its intended purposes, the material in it must identify individual respondents in some way, by Social Security number for individuals, or an analogous code number now used within the Census for business enterprises called the Alpha number. Without such identification, the Center cannot meet its prime purpose of integrating the data collected by various agencies into a single consistent body. Whether these Social Security or Alpha numbers need in turn to be keyed to a list of respondents which identifies them by name and address within the data center itself, or whether that need be done only within the actual data collecting agencies, is a technical detail. That it must be done someplace is perfectly clear, as it now is done within the several agencies that collect the information.

On the other hand, it is not in general necessary that the central files in the data center contain a complete replica of every file on every respondent who has provided information to the original collectors. In many cases-e.g., the Social Security files a properly designed sample would serve the same purposes more economically. To this extent, then, the data center will not contain a file on every individual, every household, every business, etc., but a mixture of a collection of samples-some of them relatively large-and complete files of some groups of reporting units which are particularly interesting and important from an analytical point of view. But here again, the significance of the difference between reproducing for the data center a complete file which already exists in some other agency, and reproducing only a sample therefrom, can easily be over emphasized.

« Previous Continue »

Books

Computer Privacy: Hearings, Ninetieth Congress, First [and Second] Session ...