Statistical Article - FURTHER INTERACTIONS WITH CATEGORICAL VARIABLES IN REGRESSION

FURTHER INTERACTIONS WITH CATEGORICAL VARIABLES IN REGRESSION

David P. Nichols

From SPSS Keywords, Number 62, 1996

In the last issue, we talked about interpretation of the parameters of a regression model that included one dichotomous and one continuous predictor, plus their interaction. As has been the case throughout this series, the point was to illustrate the dependence of parameter interpretations on the way the predictor variables were coded; that is, on how the model was parameterized. We saw how the interpretation of the "main effect" of the continuous income variable was conditional upon the way the dichotomous culture variable was coded. We also saw that in the two parameterizations we used (reversing the 0-1 dummy coding for culture), the culture "main effect" parameter had the same absolute value, producing the same t-statistic and significance level. At the end of that article, we alluded to the fact that this would not necessarily be the case under alternative parameterizations, and promised to show why.

Figure 1 gives the same numbers as Figure 1 from the previous issue. Recall that the culture dummy variable was coded 0 for no and 1 for yes, and that the income variable was expressed as a percentage deviation from the national mean. Though it was not specified at the time, the national mean was based on an unweighted mean of individuals or a weighted mean of states, so that states with higher populations contributed more heavily. When we use such a variable in an analysis with each state treated as a single unit, the mean of the income variable is not 0; in this case, it's -5.08 (states with larger populations tend to have higher relative incomes).

Figure 1: Original REGRESSION results (from previous issue) ------------------------------------------------------------------------------- Variable B SE B Beta T Sig T CONSTANT 4.530989 .620501 7.302 .0000 CULTURE 4.377789 .909829 .563756 4.812 .0000 INCOME .082682 .040147 .318512 2.059 .0451 CUL_INC -.130178 .057808 -.366026 -2.252 .0291 -------------------------------------------------------------------------------

Suppose we now recompute the income variable by centering it; that is, we make it so that it has a mean of 0 in our sample (we do this by adding 5.08 to the earlier income variable). What happens now when we recompute the interaction product variable for culture and income and run a regression using the same dummy coding for culture? As you can see from Figure 2, we now get different results for the constant term and for the culture "main effect." The constant is of course the predicted value when all predictors are set to 0. In both cases, this means culture=0. However, in Figure 1, it means that income is set to 100% of the national mean, while in Figure 2, income is set to 94.92% of the national mean. Thus the constant for Figure 2 is equal to the original constant minus 5.08 times the income coefficient:

4.110964 = 4.530989 - 5.08 * .082682.

Figure 2: Centered REGRESSION results ------------------------------------------------------------------------------- Variable B SE B Beta T Sig T CONSTANT 4.110964 .635702 6.467 .0000 CULTURE 5.039091 .864147 .648916 5.831 .0000 INCOME .082682 .040147 .318512 2.059 .0451 CUL_INC -.130178 .057808 -.343087 -2.252 .0291 -------------------------------------------------------------------------------

Notice also that the culture coefficient has changed, from 4.377789 to 5.039091, as has it's t-value and significance. This is because we are now estimating something different: the difference in predicted value for a state with culture=1 compared with a state with culture=0, but this time at income equal to -5.08 on the old scale (which is 0 on the new scale). The culture coefficient in Figure 2 is (to within rounding error) the original culture coefficient minus 5.08 times the interaction coefficient:

5.039091 = 4.377789 - 5.08 * (-.130178).

The primary implication of course is that the interpretation of the "main effect" of the culture variable, like that of the income variable, depends on how the model has been parameterized. There is no single interpretation of this effect available. Another implication is that we can make this coefficient estimate the predicted difference between groups at any fixed value of the income variable we choose, simply by subtracting that value from the original variable. A common usage of this property is the centering of variables so that comparisons are made at the mean of a variable rather than at the 0 point of the original continuous predictor, which often isn't of interest.

Finally, you may have noticed that the interaction coefficient remained the same in absolute value throughout our variable transformations. It is possible to change this value by rescaling the predictors. If we change the distance between the two groups on the dichotomous culture predictor, while keeping the continuous income predictor the same, the result is to multiply the interaction coefficient by the reciprocal of the change in distance (e.g., doubling the distance between the two group codes, to say -1 and 1 rather than 0 and 1, results in a halving of the interaction coefficient). Similarly, keeping the unit distance between the culture codings and multiplying the continuous income predictor by a constant produces a reciprocal change in the interaction coefficient (e.g., multiplying the income variable by two produces an interaction coefficient half the size of the original one). Note in each case that the interaction product variable must be recomputed from the transformed original variables.

While linear transformations (multiplication by a constant and addition of another constant, or new=a+b*old) of original variables will rescale the interaction term, the standard error will also be rescaled, resulting in the same t-statistic and significance level. The interaction term in this model is the highest order term in a hierarchical model, and is thus _invariant_ under such transformations. It is the only term in this model for which this is true. All lower order terms are "contained within" or "marginal to" this interaction effect, and are thus dependent upon the specific model parameterization for their meaning. In a nutshell, this is the lesson of this series.