Introduction to Multiple Linear Regression
Gerard E. Dallal, Ph.D.
If you are familiar with simple linear regression, then you know the
very basics of multiple linear regression. Once again, the goal is to
obtain the least squares equation (that is, the equation for which the
sum of squared residuals is a minimum) to predict some response. With
simple linear regression there was one predictor. The fitted equation
was of the form
.
,The output from a multiple linear analysis will look familiar. Here is an example of cross-sectional data where the log of HDL cholesterol (the so-called good cholesterol) in women is predicted from their age, body mass index, blood vitamin C, systolic and diastolic blood pressures, skinfold thickness, and the log of total cholesterol.
The REG Procedure
Model: MODEL1
Dependent Variable: LHCHOL
Analysis of Variance
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 0.54377 0.06797 6.16 <.0001
Error 147 1.62276 0.01104
Corrected Total 155 2.16652
Root MSE 0.10507 R-Square 0.2510
Dependent Mean 1.71090 Adj R-Sq 0.2102
Coeff Var 6.14105
Parameter Estimates
Parameter Standard
Variable DF Estimate Error t Value Pr > |t|
Intercept 1 1.16448 0.28804 4.04 <.0001
AGE 1 -0.00092863 0.00125 -0.74 0.4602
BMI 1 -0.01205 0.00295 -4.08 <.0001
BLC 1 0.05055 0.02215 2.28 0.0239
PRSSY 1 -0.00041910 0.00044109 -0.95 0.3436
DIAST 1 0.00255 0.00103 2.47 0.0147
GLUM 1 -0.00046737 0.00018697 -2.50 0.0135
SKINF 1 0.00147 0.00183 0.81 0.4221
LCHOL 1 0.31109 0.10936 2.84 0.0051
To predict someone's logged HDL cholesterol, just take the values of the predictors, multiply them by their coefficients, and add them up. Some coefficients are statistically significant; some are not. What we make of this or do about it depends on the particular research question.
It is reasonable to think that statistical methods appearing in a wide variety of text books have the imprimatur of the statistical community and are meant to be used. However, multiple regression includes many methods that were investigated for the elegance of their mathematics. Some of these methods (such as stepwise regression and principal component regression) should not be used to analyze data. We will discuss these methods in future notes.
The analyst should be mindful from the start that multiple regression techniques should never be studied in isolation from data. What we do and how we do it can only be addressed in the context of a specific research question.