Exercise 3.1
The t-statistics computed on Table 3.4 are computed individually for each coefficient since they are independent variables. Accordingly, there are 4 null hypotheses that we are testing:
- H_0 for "TV": in the presence of Radio and Newspaper ads (and in addition to the intercept), there is no relationship between TV and Sales;
- H_0 for "Radio": in the presence of TV and Newspaper ads (and in addition to the intercept), there is no relationship between Radio and Sales;
- H_0 for "Newspaper": in the presence of TV and Radio ads (and in addition to the intercept), there is no relationship between Newspaper and Sales;
- H_0 for the intercept: in the absence of TV, Radio and Newspaper ads, Sales are zero;
versus the 4 corresponding alternative hypotheses:
H_a: There is some relationship between TV/Radio/Newspaper and Sales, or Sales are non-zero in the absence of the other variables.
Mathematically, this can be written as
H_0: \beta_i=0, for i = 0,1,2,3,
versus the 4 corresponding alternative hypotheses
H_a: \beta_i\neq0, for i = 0,1,2,3.
As can been seen on Table 3.4 (and below with Python), for all the variables the p-value is practically zero, except for Newspaper for which it is very high, namely .86, much larger than the typical confidence levels, 0.05, 0.01 and 0.001. Given the t-statistics and the p-values we can reject the null hypothesis for the intercept, TV and Radio, but not for Newspaper.
This means that we can conclude that there is a relationship between TV and Sales, and between Radio and Sales. Also rejecting \beta_0=0, allows us to conclude that in the absence of TV, Radio and Newspaper, Sales are non-zero. Not being able to reject the null hypothesis \beta_{Newspaper}=0, suggests that there is indeed no relationship between Newspaper and Sales, in the presence of TV and Radio.
Additional comment
At a 5% p-value, there would be a 19% chance of having one appear as significant out of 3 variables, even if there was no relationship for all of them.
(1-.95^4)
Auxiliary calculations
import pandas as pd
import statsmodels.api as sm
df = pd.read_csv('../data/Advertising.csv')
from statsmodels.formula.api import ols
model = ols("Sales ~ TV + Radio + Newspaper", df).fit()
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Sales R-squared: 0.897
Model: OLS Adj. R-squared: 0.896
Method: Least Squares F-statistic: 570.3
Date: Tue, 24 Oct 2017 Prob (F-statistic): 1.58e-96
Time: 10:19:37 Log-Likelihood: -386.18
No. Observations: 200 AIC: 780.4
Df Residuals: 196 BIC: 793.6
Df Model: 3
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
Intercept 2.9389 0.312 9.422 0.000 2.324 3.554
TV 0.0458 0.001 32.809 0.000 0.043 0.049
Radio 0.1885 0.009 21.893 0.000 0.172 0.206
Newspaper -0.0010 0.006 -0.177 0.860 -0.013 0.011
==============================================================================
Omnibus: 60.414 Durbin-Watson: 2.084
Prob(Omnibus): 0.000 Jarque-Bera (JB): 151.241
Skew: -1.327 Prob(JB): 1.44e-33
Kurtosis: 6.332 Cond. No. 454.
==============================================================================
Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Further reading
ISL:
- Page 67, 68
- Footnote page 68
H_0:
Multiple regression:
- https://www.datarobot.com/blog/multiple-regression-using-statsmodels/
- https://www.coursera.org/learn/regression-modeling-practice/lecture/xQRab/python-lesson-1-multiple-regression
- http://www.scipy-lectures.org/packages/statistics/index.html#multiple-regression-including-multiple-factors
- http://stackoverflow.com/questions/11479064/multiple-linear-regression-in-python