CONTINUOUS BY CATEGORICAL INTERACTIONS IN REGRESSION

David P. Nichols

From SPSS Keywords, Number 61, 1996

Continuing the topic of using categorical variables in linear regression, in this issue we will briefly demonstrate some of the issues involved in modeling interactions between categorical and continuous predictors. As in previous issues, we will be modeling 1990 murder rates in the 50 states of the U. S. Our predictors will be the previously used 0-1 culture dummy variable, along with a new variable: state 1990 per capita income, expressed as a percentage deviation from the national average (i.e., a value of 10 indicates that a state's per capita income was 10% above the national mean).

For most people, the parameterization of choice in this situation is to code the culture dummy variable so that 0 means no and 1 means yes (that state is deemed to be affected by the cultural factor of interest). The interaction variable is created by multiplying the dummy variable by the income variable. Results of this REGRESSION are given in Figure 1. As always, the constant is the predicted value for the dependent variable when all predictors are 0. In this case, it represents the predicted 1990 murder rate for a state without the cultural characteristic of interest, with 1990 per capita income equal to the national average. The culture parameter gives the change in predicted value for the affected states relative to the others when income is 0 (at the national average). Affected states have a much larger predicted rate (almost twice as high). The income parameter gives the predicted slope for the unaffected states; increases in per capita income are associated with higher predicted murder rates for these states. The interaction parameter estimates the difference in predicted slope for the affected states; the -.13 value here can be added to the .08 value for income to obtain the predicted slope for these states, which is about -.047. In other words, for these states, increases in income are associated with decreases in predicted murder rates.

Figure 1: REGRESSION results with dummy coding (1=YES) ---------------------------------------------------------------------------- Variable B SE B Beta T Sig T CONSTANT 4.530989 .620501 7.302 .0000 CULTURE 4.377789 .909829 .563756 4.812 .0000 INCOME .082682 .040147 .318512 2.059 .0451 CUL_INC -.130178 .057808 -.366026 -2.252 .0291 ----------------------------------------------------------------------------

Note that the interpretation of the "main effects" is conditional upon the level of the other variable, due to the presence of the interaction in the model. An alternative parameterization is presented in Figure 2. Here, the dummy variable for culture has been reverse coded, 0 for affected, 1 for not. Here, the constant now gives the affected group's predicted value at 0 or mean income. Culture again compares the two groups at mean income, but this time subtracts affected from unaffected and thus has a negative sign. Income gives the -.047 value we calculated above: the predicted slope for the affected states. The interaction coefficient is the same but with an opposite sign. Again, adding this to income produces the predicted slope for the group coded 1.

Figure 2: REGRESSION results with dummy coding (1=NO) ---------------------------------------------------------------------------- Variable B SE B Beta T Sig T CONSTANT 8.908778 .665407 13.388 .0000 CULTURE -4.377789 .909829 -.563756 -4.812 .0000 INCOME -.047495 .041593 -.182964 -1.142 .2594 CUL_INC .130178 .057808 .351942 2.252 .0291 ----------------------------------------------------------------------------

Note that the t-statistics for the constant and income parameters differ between the two tables. This is because these parameters are estimates of different things under the alternative parameterizations. You should be able to reproduce exactly the same predicted value for a given combination of culture and income from the two parameterizations in order to verify the fact that they are estimating the same overall model. For example, for an affected state with an income value of 10, parameterization 1 would predict

4.530989 + 4.377789 * 1 + .082682 * 10 - .130178 * 1 * 10 = 8.4338

while parameterization 2 would predict

8.908778 - 4.377789 * 0 - .047495 * 10 + .130178 * 0 * 10 = 8.4338.

While the intercept and income terms here varied with the parameterization of the model, the culture term did not. There are ways of parameterizing the model that would produce different results for culture. Can you think of what some of these might be? We'll talk about this in the next issue.