Statistical Article - USING POLYTOMOUS PREDICTORS IN REGRESSION

USING POLYTOMOUS PREDICTORS IN REGRESSION

David P. Nichols

From SPSS Keywords, Number 57, 1995

Here we extend our discussion begun last issue on categorical predictors in linear regression to the case of a three level predictor variable. One way to do this in our example is to distinguish between states with a death penalty statute but no executions in a relevant time period and those where one or more executions took place during that time. Here we will use a new variable called STATUS89, which takes on a value of 0 for those states with no death penalty, 1 for the 28 states with death penalty statutes in force but no executions taking place in 1989, and 2 for the eight states where executions occurred during 1989. The no death penalty states are the same 14 states from last time, with a mean 1990 murder rate of about 4.97. The 28 states with statutes but no 1989 executions had a rate of about 6.98, and the eight executing states had a rate of about 10.96.

How should we represent our new three level predictor? The dummy variable approach is again probably the most popular one among applied researchers. Since there are three groups, we have two degrees of freedom for making comparisons. Thus we need two dummy variables to represent STATUS89. If we make STATUS1 a dummy variable with a value of 1 for the states with a value of 1 for STATUS89 and 0 otherwise, and make STATUS2 a dummy variable with a value of 1 for the states with a value of 2 for STATUS89 and 0 otherwise, then the STATUS1 coefficient will compare states with a value of 1 on STATUS89 with those having a value of 0, and the STATUS2 coefficient will compare states with a value of 2 on STATUS89 with those having a 0 value. One useful way to look at what we are doing is to lay out the design matrix we are using at the cell level; that is, to list out the values of the new predictor variables once for each different level of the original predictor. This listing is given in Figure 1:

Figure 1 --------------------------------------------------------------------------- STATUS89 CONSTANT STATUS1 STATUS2 0 1 0 0 1 1 1 0 2 1 0 1 ---------------------------------------------------------------------------

One question that may come to mind here is how does STATUS1 compare level 1 of STATUS89 to level 0 and not to both levels 0 and 2? Similarly, how does STATUS2 represent a comparison between level 2 of STATUS89 and level 0 and not between level 2 and the other two levels? To see why this is the case, note that cases at level 1 of STATUS89 differ from those at level 0 only on STATUS1, while cases at level 2 differ from those at level 0 only on STATUS2. From this coding layout, we can see that the states at level 0 of STATUS89 are predicted by the CONSTANT, while states at level 1 are predicted by the sum of the CONSTANT and STATUS1 coefficients, and the states at level 2 of STATUS89 are predicted by the sum of the CONSTANT and STATUS2 coefficients. Thus our regression coefficients should yield the mean for the no death penalty states as the CONSTANT, the difference between the non-executing death penalty states and the no death penalty states as the STATUS1 coefficient, and the difference between the executing states and the no death penalty states for the STATUS2 coefficient. As the output from REGRESSION in Figure 2 shows, this in indeed the case:

Figure 2 --------------------------------------------------------------------------- Multiple R .49443 R Square .24446 Adjusted R Square .21231 Standard Error 3.46979 Analysis of Variance DF Sum of Squares Mean Square Regression 2 183.08974 91.54487 Residual 47 565.85446 12.03946 F = 7.60374 Signif F = .0014 ------------------ Variables in the Equation ------------------ Variable B SE B Beta T Sig T STATUS1 2.007143 1.135756 .257430 1.767 .0837 STATUS2 5.991071 1.537821 .567498 3.896 .0003 (Constant) 4.971429 .927341 5.361 .0000 ---------------------------------------------------------------------------

As noted earlier, most of the identities between various numbers on the output disappear when multiple predictors are involved. The t-tests for the individual STATUS coefficients indicate that the hypothetical population means for no death penalty and death penalty but no executions groups may not be different, while the hypothetical population mean for the executing states would seem to be higher than that for the no death penalty states. The F-test in the analysis of variance table tests the null hypothesis that both STATUS coefficients are 0 in the population, which means logically that all three population means are equal. Note that despite the logical relationships between hypotheses tested by the two types of tests (one or more nonzero coefficients implies rejection of the omnibus null hypothesis, and vice versa), when making inferences about population values based on sample data, it is possible to obtain logical contradictions. That is, one or more individual t-tests may be "significant" when the overall F-test is not, or the overall F-test may be "significant" when no individual t-tests are. If the overall F-test is "significant" then a paramterization can be found that will produce at least one "significant" t-test, but in some cases this parameterization many not provide any useful interpretation in terms of differences among group means.

Figure 3 shows what the default MANOVA parameterization produces for this analysis. As can be seen in the ANOVA table, the overall F-value for STATUS89 is identical to that given for the overall regression in REGRESSION. This is true because even though the different parameterizations produce different coefficients, we are still using two predictors to differentiate among three groups. Any two nonredundant predictors accurately representing the differences among our three groups would produce the same results. This is formally stated by saying that the overall test is invariant under different parameterizations. That is, we are testing the same omnibus null hypothesis of equality among the three population means, regardless of the specific contrast codings we use to compare groups. As with the earlier dichotomous predictor, the CONSTANT coefficient in this analysis represents the simple unweighted mean of the sample group means.

Figure 3 --------------------------------------------------------------------------- Tests of Significance for MURDER90 using UNIQUE sums of squares Source of Variation SS DF MS F Sig of F WITHIN+RESIDUAL 565.85 47 12.04 STATUS89 183.09 2 91.54 7.60 .001 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Estimates for MURDER90 --- Individual univariate .9500 confidence intervals CONSTANT Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 1 7.63750000 .55726 13.70539 .00000 6.51643 8.75857 STATUS89 Parameter Coeff. Std. Err. t-Value Sig. t Lower -95% CL- Upper 2 -2.6660714 .77278 -3.44996 .00119 -4.22071 -1.11143 3 -.65892857 .67370 -.97808 .33304 -2.01423 .69638 ---------------------------------------------------------------------------

The two deviation coefficients for STATUS89 give the means of each of the first two levels minus the unweighted grand mean of all three levels. This type of parameterization is often referred to as "effect" coding, because the parameter for a level of a factor variable is interpreted as the effect of being at that level of the predictor as opposed to being at the overall average. This terminology is probably more familiar to users whose linear models work has been handled in an analysis of variance framework, as opposed to a multiple regression approach. What is important to note here is that notwithstanding certain terminological preferences, both approaches are doing exactly the same thing.

Since we're fitting the same overall model in MANOVA that we did in REGRESSION, we must be able to predict the same values for each state that were predicted earlier. In order to be able to see exactly how this is done, we need to look at the design or basis matrix that MANOVA used. As you can see by looking at the coefficients in matrix in Figure 4, the predicted value for the no death penalty group is derived by summing the first and second parameter estimates, the value for the death penalty but no executions group is obtained by summing the first and third estimates, and the death penalty group value is obtained by subtracting the second and third estimates from the first. By doing the arithmetic here yourself, you can verify the identical predictions resulting from the two different parameterizations.

Figure 4 --------------------------------------------------------------------------- STATUS89 CONSTANT STATUS89 STATUS90 (1) (2) (3) 0 1 1 0 1 1 0 1 2 1 -1 -1 ---------------------------------------------------------------------------