factors was analyzed individually to determine its affect upon salary within the various racial-sexual groups and to identify differences in the affect from one group to another. Each factor was examined to see to what extent differences in its values across racial-sexual groups could be said to explain the salary differences across groups. That is, did minorities and women have, on the average, lower levels of education or age or experience than all employees and did such differences, if existent, account for their lower salaries? The analysis of the earlier sections demonstrates conclusively that age, education, and length of service when considered individually do not account for the major portion of salary differences existing in the Department. The question remains, though, as to whether these factors, if considered simultaneously, might explain considerably more of the differences in salaries. Clearly, this is an important issue. a. Use of Correlation Analysis To approach this question, a correlation analysis was utilized. is a powerful research tool for investigating the relationship between a given dependent variable and certain independent variables believed to be significant in explaining variation in the dependent variable. In this analysis, actual salary is the dependent variable, and the independent variables thought to explain salary variations are race, sex, age, education, years of government service, geographic location, and organizational unit. These are postulated as determiners of salary in the Labor Department. The correlation analysis yields an equation for salary a function of the independent variables identified above. This equation can be used as a predictor model for salaries in the Department by plugging in specific values for race, sex, age, education, and the other independent variables and then calculating salaries. b. The Dependent and Independent Variables The dependent variable in all of the models developed was actual salary as extracted from the current file of the Department's computerized personnel records. The independent variables varied with different models, and, except for the education data, they also derived from the current computer file. As the Department's computerized records do not presently contain data on educational attainment, such data must be obtained from official personnel records. The enormity of such a task dictated that a sample be chosen for which the data could be obtained. The specifics regarding the sampling process are described in Appendix A-2.. The number of employees for which education data was obtained determined the number of observations avail able for the correlation analysis. Thus, the correlation was performed using data representing 817 of the Department's 10,700 employees, a 7.6 percent sample.27/ The breakdown of the data by race, sex, and professional/nonprofessional status is shown by the table on the following page. 27/ See Appendix A-2 for a discussion of the validity of the sample (that is, the degree to which it is representative of DOL employees), 28/ Dummy variables are often used to study the affects of variables that do not vary continuously over a range of values. For example, such variables can be used to represent the yes/no status as regards membership in certain classification groupings, like white or black or nonblack minority, male or female, etc. They are always used to give values with respect to a specified base group. In this case, employees with a high school diploma comprise the base group. The various dummy variables show the average increase or decrease in annual salary, due to other levels of educational degree attainment, as compared against those with high school diplomas. A value of $2,000 (Continued on page77 ). Distribution of the Input Data for the Correlation Analysis by Race, Sex, and Professional/Nonprofessional Status Various forms of the independent variables were used in order to find the forms giving the best fit. One re gression fit is better than another if the choice and form of the independent variables results in more of the variation of the dependent variable being explained. Of course, the criterion of reasonableness must always be applied. Variables cannot be manipulated just to obtain the best fit. They must be chosen because they correspond to a particular theory or model which has been hypothesized to describe reality. The correlation is, then, a test of the various models. A brief discussion of the various forms of the independent variables used in this analysis follows: Education. It was hypothesized that salary increased linearly with increases in the number of years of completed education. Also, it was believed that extra salary benefits might result from the attainment of various degrees. In other words, if it were found that each year of additional education was worth, on the average, another $500 annually in salary, it might also be true that, for example, the salary gain from 15 years of education to 16 years of education, when a bachelor's degree is normally earned, is really $500 plus an additional $2,000 for the |