annual salary (race and sex, years of education, The hyphenated regional and organizational variables in the equation show how the regions and A&O's were grouped in the collapsing process. As discussed in the "variables" section above, the educational dummies used were degree, advanced degree, two-year certificates, and no high school diploma; the reference group was employees with a high school diploma. This basic model was applied in individual computer runs on the different racial-sexual groups. The only variants were the choices of race and sex dummies. Using the basic model, then, salary determination models were developed for various groupings as follows, and with the form of race-sex dummies as indicated. The correlation was performed on the National Institutes of Health Computer System using a regression program made available by the Bureau of Labor Statistics. The program, called Step Supra, provides for stepwise multiple linear regression analysis. Program options were selected to yield the following output: a listing of the imput data; a table of means, standard deviations, and sum of squares; a correlation matrix; a table of beta coefficients and elasticities at the means; and a plot of the standardized residuals by observation. In addition, the program automatically prints the following: For each step in the regression: (1) (2) for each variable included in the regression (a) regression coefficient (3) for each variable not included in the regression After the stepwise procedure is completed: (1) a summary table containing, for each variable included in the regression equation, its e. Limitations. When interpreting the results of an analytical study, the limitations of the study must be borne carefully in mind. In this correlation analysis, the first and most obvious limitation is the relatively small number of observations (817) as compared to the number of DOL employees (10,700). The number of observations, as mentioned earlier, was determined by the size of the education sample. While a larger number of employees might have given better regression results, it should be noted that the sample was carefully chosen to be statistically valid with respect to all measureable aspects (e.g., salary, age, and experience by race and sex). Still, the large number of variables used in the regression could result in cells with too few observations to allow any valid conclusions re garding those cells. A second limitation derives from other factors that influence salary but which were not included in this analysis. Such excluded factors are: (1) ability measures, (2) quality measures for education and experience, (3) non-government experience, (4) job classification series, (5) personal mannerisms (e.g., appearance and personality), and (6) number and quality of personal contacts. Inclusion of some of these would have improved the regression and reduced some of the unexplained variation in salary. Also, the addition of time series data might have been significant. The last set of limitations represents the nemesis of all regression and correlation analyses: that is, the many ways in which the practical problem fails to satisfy all of the theoretical assumptions underlying correlation. First, the relationship between many of the independent variables and salary is not really linear. The age and experience factors have already been discussed in this light. Second, there existed relatively high correlation among some of the independent variables again, particu larly among age, age-squared, and years of service. And third, quite a large number of dummy variables had to be used, relative to the number of continuous variables. f. The results. As listed in the table in Section C above, salary models were developed for twelve different employee groupings. The regression equation was statistically significant at a one percent alpha-level in each of the twelve models. In fact, the F-statistics were |