Page images
PDF
EPUB

include one-half of the individual measures; that is, it is a value such that the number of deviations that exceed it (in either direction from M) is the same as the number of deviations that fall short of it." 1

Pearl, speaking of it in a different application, says:

"Suppose that we read that the mean length of the thorax of a thousand fiddler crabs is 30.14 ± .02 mm. Just what does this actually mean? Accepting the figures at their face value, or, put another way, assuming that the mathematical theory on which the probable error was calculated was the correct one, the figures mean something like this: If one were to take, quite at random, successive samples of 1000 each from the total population of fiddler crabs and determine the mean thoracic length from each sample, these means would all be different from each other by varying amounts. In other words, no single sample would give us the absolutely true value of the mean thoracic length of the fiddler crab population. The true value is in an absolute sense unknowable, because, for one reason, always we must come at the finding of it by way of random sampling, and sampling means variation. Now it is an observed fact of experience that the variations due to random sampling distribute themselves according to a definite law of mathematical probability. Knowing this law, it is clearly possible to state the mathematical probability for (or against) any particular deviation or variation occurring as the result of random sampling. Exactly this is what the probable error does. It says, in the particular case here considered, that it is an even chance, that a deviation or variation in the value of the mean as great or greater than .02 mm. above or below will occur as a result of random sampling. Or, put in another way, if we took successive samples of 1000 each from this crab population, it is an even bet that the value of the mean from any sample would fall between 30.14 + .02 30.16, and 30.14

[blocks in formation]
[ocr errors]

= 30.12."

=

The probable error, therefore, is a means of testing the reliability of samples provided that data approach the nor

1 Whipple, Guy M., Manual of Mental and Physical Tests, Part 1, p. 23. 2 Pearl, Raymond, Modes of Research in Genetics, pp. 96-97.

mal probability distribution. The probable error of a given deviation is then indicated by one half of the distance between the upper and the lower quartiles, i.e. the quartile measure of deviation furnishes a measure of the likelihood that a deviation will fall within one half of the distance above or below the median.1 Referring again to the distribution in Table M, Chapter VIII, the semi-quartile range was found to be $1.11, and the standard deviation $1.75.2 Applying the formula, P. E. 0.6745 σ, in this case the P. E. should have been $1.18 rather than $1.11. The computed, therefore, is 94.1 per cent of the theoretical probable error.

=

The probable error may likewise be computed for the arithmetic mean of a number of measurements, the means of which vary. Suppose it is desired to measure the length of time in which a certain manufacturing process is completed, or in which a given task is done, as a basis for task setting. If a large number of trials are made for homogeneous groups of operators and averages of the periods taken for each group, these will vary. The standard deviation of the averages and its probable error may be taken in the same way that they are computed for single variations. The formula for the probable error of the mean is

[blocks in formation]

1 For a normal distribution arithmetic mean and median coincide.

2 Supra, p. 406.

3 See the interesting account of the results of a series of experiments involving the accuracy with which estimation is made by trained employees. Harris, J. Arthur: "Experimental Data on Errors of Judgment in the Estimation of the Number of Objects in Moderately Large Samples, with Special Reference to the Personal Equation." The Psychological Review,

The meaning of such a figure is indicated above in the quotation from Professor Pearl.

A few instances where the probable error may be applied in economic studies may be cited. Breeders of animals and plants find constant need of using it in studies of variation from type and in correlation. Moreover, in the selection of men according to psychological and other tests,2 in the grading of cotton and grains, in the setting of tasks, and the establishment of piece-rates of compensation on the basis of the "average" operator's performance, some measure of the reliability of the samples must be employed. Again, according to some the only scientific method of establishing the pure premium for industrial accident insurance is to compare homogeneous conditions of risk exposure and to test the homogeneity by measures of dispersion. Conformity to the normal law is proof that conditions are homogeneous. Most comparisons, it is held, involve non-homogeneous conditions. The proper unit is not the "establishment," but similar risk conditions in many establishments or industries.

In studies of correlation the probable error always accompanies the coefficient as a test of reliability. This phase of the problem is discussed later.4

It must be remembered that the probable error is to be used only when distributions approach the normal probability form and where samples are relatively numerous.

Vol. XXII, No. 6, November, 1915, pp. 490-511. In this series of experiments there is a clear tendency for the estimates to be too high.

1 Davenport, Eugene, The Principles of Breeding, passim, New York, 1907.

2 Whipple, Guy M., Manual of Mental and Physical Tests, Baltimore, 1914.

3 Cf. Fisher, Arne, Proceedings of the Casualty, Actuarial, and Statistical Society of America, Vol. II, Part III, No. 6, May, 1916.

4 See Chapter XII, infra.

The standard deviation, however, as a measure of divergence from the norm is of general application. As Yule says, "In the case of small samples, the use of the probable error is consequently of doubtful value while the standard error (deviation) retains its significance as a measure of dispersion." However, "On the whole, the use of the 'probable error' is of little advantage compared with the standard...."2

1

III. SKEWNESS

[ocr errors]

1. Meaning of Skewness

Measures and coefficients of dispersion, both in historical and frequency series, indicate absolutely or relatively the differences of the separate measures from a single one taken as a standard. They represent deviations from type, varying emphasis being given to the differences depending upon the particular measure used. The average deviation gives all differences their normal weight; the standard deviation accentuates those far removed from type, but still averages them. The quartile measure includes only those lying within the boundaries of the first and third quartile. As such, none of them reveal the distributions of the deviations. Differences from the type are not localized. The degree to which they cluster above or below the type is not shown. What measures of skewness do is to localize the degree to which distributions are pulled, distorted, or skewed from normality, i.e. from the symmetrical form which they take when mode, median, and arithmetic mean coincide. The differences between these in themselves indicate asymmetry, that is, a piling up or scattering of frequencies on one or the other side of the type. These may be expressed relatively,

1 Yule, G. U., Introduction to the Theory of Statistics, p. 307.

2 Ibid.

so as to admit of comparison, by being reduced to coeffi-f cients. Measures of dispersion which characterize the distribution on both sides of the type must be used as divisors, since what is desired is a relative expression of the localization of asymmetry. To divide by the units in which the measures are expressed would be simply to reduce the deviations to a relative basis.

Distributions generally are skewed to some degree. Rarely if ever, even among natural phenomena, is complete symmetry found. This may be due to the unrepresentativeness of the samples, to imperfect measurements, or to other causes. Distributions may be scattered widely or closely grouped, but rarely are they uniformly grouped or distributed about a norm. Measures and coefficients of skewness localize deviations from symmetry; measures and coefficients of dispersion only reveal the amount of scatteration or cluster.

2. Measures and Coefficients of Skewness

The chief and currently used measure of skewness is the difference between the arithmetic mean and the mode. If the mean exceeds the mode that is, is drawn away from the typical instance by the presence of extreme items skewness is said to be positive. If it is less than the mode that is, is drawn away from the typical instance because of extreme items-skewness is said to be negative. The mode is unaffected by extremes, either small or large, except at or near the center of a distribution; while the arithmetic mean is not only affected by the size of the items but also by the distance away from the center of gravity. The dif- ||

1 Cf. Tolley, Howard R., "Frequency Curves of Climatic Phenomena," in Monthly Weather Review, United States Department of Agriculture, Vol. 44, November, 1916, pp. 634-642, 636.

0

« PreviousContinue »