The School Executive's Guide to Performance Contracting

best, develop new educational programs, or measure precisely the effect of incentives on contractors, teachers, or students.

Evaluation was based on standardized tests administered to experimental and control groups at the beginning and end of the year. Payments to contractors were based on one of three different standardized tests given to experimental groups only (75 percent of the payment) and criterion-referenced tests, given five times a year to experimental groups only (25 percent of the payment).

Six private firms representing a range of educational approaches operated in 18 sites representing a range of educational situations. At each site there were 100 children in grades 1-3 and 7-9 in performance contracting classes. There were also 100 children in grades 1-3 and 7-9 in traditional (control) classrooms.

The following is taken directly from the OEO summary and conclusions:

The results of the experiment clearly indicate that the firms operating under performance contracts did not perform significantly better than the more traditional school systems. Indeed, both control and experimental students did equally poorly in terms of achievement gains, and this result was remarkably consistent across sites and among children with different degrees of initial capability. On the basis of these findings it is clear that there is no evidence to support a massive move to utilize performance contracting for remedial education in the nation's schools. School districts should be skeptical of extravagant claims for the concept.

At the same time, the results should not be interpreted as a blanket finding that educational services and materials should not be purchased under performance-based contracts or that private firms cannot provide valuable educational services. Surely performance-based contracts are in some cases a better way to purchase some educational services than the methods currently being used. Surely private firms should continue to play an important role in developing and marketing new educational materials. The results simply say that an uncritical rush to embrace these concepts is unwarranted at this time.

Some of the benefits of this experiment will not be known for some time, and indeed cannot be precisely pinpointed. The experiment has provided or added to useful debates on the current use of standardized tests for measuring student performance, on means of introducing change into the educational system, and in general on the subject of accountability. It has raised the possibility that other performers besides schools may sometimes be appropriate providers of education. And, hopefully, it will lead to a heightened awareness of the importance of specifying educational goals and measuring progress toward those

goals, a process that all too frequently has not been undertaken by school districts.

But surely the clearest conclusion drawn from the experiment is that we still have no solutions to the specific problem of teaching disadvantaged youngsters basic math and reading skills . . .

Q. No pussyfooting there. What do the critics say?

A. To date there have been a number of challenges and doubtless more will come. The conclusions were questioned by a Knowledge Industry Report on February 15, 1972. The New York Times, in a March 21 editorial, spoke of the OEO's main conclusion as "an oddly quick and sweeping judgment after only one year's experimentation," adding that it "has the earmarks of a subjective, if not downright political judgment rather than a scientific assessment." Said the Times. "At so early a stage of the experiment, it would have been far more useful to weed out those contractors whose methods seemed either ineffective or suspected." The Times also noted that the Rand Corporation studies, commissioned by the Department of Health, Education, and Welfare, appear to be at variance with the OEO's pessimism.

One close student of performance contracting, James Mecklenburger, who has written a dozen articles on the subject and a book soon to be published by the National Society for the Study of Education, pointed out 13 errors, oversights, and inconsistencies in the initial OEO reports. Among them were the following:

1. In terms of its important accountability feature, the OEO experiment was a smashing success: Contractors who performed poorly were poorly paid.

2. The OEO was interested in cost-effectiveness: Could private contractors provide equivalent education at less cost? It was so interested, in fact, that it paid Charles Blaschke of Educational Turnkey Systems many thousands of dollars to investigate the question. Blaschke demonstrated that with even modest achievement scores, some OEO contractors' instructional programs were more costeffective than conventional instruction.2

In a Phi Delta Kappa publication titled Performance Contracting: Who Profits Most?, Blaschke says one-third of the OEO contractors' programs cost less than the control programs in math and reading. Hence significant grade level gains were made in many of the 18 sites at less cost than through traditional means.

3. The OEO once promised to detail whether performance contracting had beneficial effects other than student achievement

'In 1971-72, Jim Mecklenburger was a research assistant at Phi Delta Kappa, Bloomington, Ind.

He is now director of research for the National School Boards Association. See his Performance Contracting in American Education, 1969-71. Chicago, III.: National Society for the Study of Education. (In press)

See Blaschke's "Performance Contracting Costs, Management Reform, and John Q. Citizen." Phi Delta Kappan, December 1971.

gains. Beyond suggesting that a useful debate was provoked (see above), the press release and preliminary report say nothing about these effects, but the Rand Corporation study has a great deal to say about them. For example: "There was no evidence of dehumanization [one of the charges leveled by teacher organizations]; there was some evidence of the reverse.” (More on the Rand report later.) A reporter at the January 31, 1972, OEO press conference at which conclusions were released asked whether any of the 18 districts adopted any of the contractors' programs after the experiment was concluded. He was told that only one city continued with a performance contract (Grand Rapids). He was not told that several cities (Blaschke says at least five) have used their own funds to incorporate and sometimes expand aspects of the contractors' programs, nor was he told that several cities requested the OEO to assist them to experiment one more year (a request that was of course refused).

4. Although the OEO stressed the rigor of the experiment, the agency began in whirlwind fashion. Contracting companies were forced to create and staff programs, etc., during July and August 1970. Although the OEO intent was to provide comparative test data on performance-contracted instruction versus conventional instruction, conventional instruction began in September while contractors at many sites were unable to provide their best instruction until midfall or later.

Similarly, Battelle was expected to administer tests nationwide. But Battelle was hired in mid-August and had to create an 18-city testing program in two weeks' time. The Battelle reports admit variations in quality and reliability from city to city.

5. There were a variety of statistical problems. Battelle, as the "impartial outside evaluator," explicitly rejects the use of grade equivalent scores for evaluative purposes because they "possess psychometric distortions which might affect the results of statistical analyses." Many of the nation's testing experts concur. If one accepts this, the OEO's reported results (see Table 1 below) have questionable meaning. Despite Battelle, the OEO reported only grade equivalent scores to the press. Battelle rejects comparison of pretest and posttest mean scores of experimental and control groups because they do "not provide a quantitative adjustment in mean post-test differences due to mean pre-test differences." Nevertheless, the OEO used this method of reporting exclusively. Some idea of how biasing it might have been can be gained from noting the fact that "in 17 of the 18 sites of the experiment the average pretest level of the control group was significantly higher than that of the experimental group." As even the most unsophisticated teacher knows, the better the student, the more rapid his achievement is likely to be.

One of the most interesting analyses of the statistical failures of the OEO experiment was made by Gary Saretsky in the May 1972

issue of the Phi Delta Kappan. Saretsky points out that the "John Henry effect" was completely overlooked (as was the Hawthorne effect, for that matter). John Henry was a legendary railroad steel driver who swung his hammer in competition with a steam drill, which had been introduced experimentally to replace human steel drivers. (After he outperformed the steam drill, John Henry died from overexertion.) The John Henry effect occurs when a "control group" is placed in competition with an experimental group. There is no question about its presence at many OEO experimental sites. Saretsky quotes project directors and OEO personnel: "When you entered the control school you knew the race was on" and "The teachers were out to show that they could do a better job than those outsiders [performance contractors]."

Mecklenburger's final point is this:

In seeking a generalization about performance contracting, the OEO intentionally neglects whether any sites did very well or very poorly. In fact, some did each, as Battelle's report reveals. If OEO research had asked, "Among the 18 sites, was there any evidence of successful teaching which would reveal new knowledge about teaching 'underachieving students,' the OEO might have found that some performance contracts revealed some success. Instead, the OEO swept both success and failure beneath a statistical rug.

Competition

Here is the statistical rug. It comes from the 30-page OEO press release mentioned above.

Table I

Mean Gains of Experimental and Control Students Across All Sites

NA: A readiness test, rather than an achievement test, was used as the first-grade pretest. There is no grade equivalent for the readiness test.

[merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][merged small][subsumed][merged small][ocr errors][subsumed][ocr errors][merged small][merged small][merged small]

Q. Although OEO planners specifically renounced any intent to provide a "consumer's rating" of various contractors, is it possible from the data now reported to do so?

A. Not very reliably. In the first place, note that of the 31 technology firms responding to the OEO's request for proposals, only 6 were selected. The bases were their corporate experience and interest in performance contracting, the types of achievement they thought they could guarantee, the qualifications of their staffs, and the variety they represented in terms of their instructional approaches (i.e., emphasis on hardware, incentives, or curricular software and teacher training methods). See the chart below, which compares particular aspects of the experimental program. Many of the firms used the same software-for example, the Sullivan reading materials.

Actually, location of the site may have been as influential as any other factor in determining outcomes. Charles Blaschke points out that "medium-sized Southern sites produced five significant successes for every one failure. These schools, administratively more flexible and less unionized than Northeastern and Western schools, provide a clue to the settings where performance contracting is most likely to succeed and where resistance occurred in the project. Dallas Assistant Superintendent for Research Donald Waldrip notes that “Dallas is the only school district in the nation without a single negative comparison in the OEO report. Quality Education Development held the Dallas contract." Waldrip pointed out that in six cases the performance of the experimental group in Dallas was "significantly better" than the control. In ten cases the evaluation "favored" the experimental group. And in two cases the evaluation showed the experimental group's progress was "no different" from the traditional group's.

3 Charles Blaschke. Performance Contracting: Who Profits Most? Bloomington, Ind.: Phi Delta

Kappa, 1972.

« Previous Continue »

Books