CHAPTER 7: ANSWER KEY CASE EXERCISES 1. Show me even more money! Stata gives us the following coefficients from the regression: . regress Salary Batting_average-errors Source |
SS
df
MS
Number of obs =
-------------+------------------------------
F( 12,
337
324) =
30.95
Model |
275909639
12
22992469.9
Prob > F
=
0.0000
Residual |
240735051
324
743009.418
R-squared
=
0.5340
Adj R-squared =
0.5168
Root MSE
861.98
-------------+-----------------------------Total |
516644690
336
1537633.01
=
-----------------------------------------------------------------------------Salary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Batting_av~e1|
-2322.666
3329.87
-0.70
0.486
-8873.562
4228.23
On_base_pe~e2|
-841.4261
2936.294
-0.29
0.775
-6618.034
4935.182
runs |
4.301694
6.967091
0.62
0.537
-9.404753
18.00814
hits |
7.70872
3.993498
1.93
0.054
-.1477392
15.56518
doubles |
-6.640772
10.58735
-0.63
0.531
-27.46941
14.18787
triples |
-24.85974
26.80728
-0.93
0.354
-77.59805
27.87857
homeruns |
40.01745
15.38914
2.60
0.010
9.742202
70.2927
RBI |
13.66865
6.253432
2.19
0.030
1.366188
25.9711
walks |
7.894205
5.58035
1.41
0.158
-3.084089
18.8725
strikeouts |
-14.09367
2.629301
-5.36
0.000
-19.26632
-8.921011
stolenbases |
10.90001
5.821304
1.87
0.062
-.5523188
22.35233
errors |
-18.10502
9.087294
-1.99
0.047
-35.98257
-.2274732
_cons |
962.0107
405.7485
2.37
0.018
163.7764
1760.245
------------------------------------------------------------------------------
Some of these don’t make any sense: Batting Average, On-base percentage, Doubles, and Triples are all positive things that ought to increase a player’s salary, but each has a negative coefficient. Of course, all of these have high p-values which could explain the results. 1 2
Batting_average. On_base_percentage.
A threshold of α = 0.05 leaves us with 4 variables in the regression below: . regress Salary homeruns RBI strikeouts errors Source |
SS
df
MS
Number of obs =
-------------+------------------------------
F(
4,
337
332) =
73.97
Model |
243461396
4
60865348.9
Prob > F
=
0.0000
Residual |
273183294
332
822841.248
R-squared
=
0.4712
-------------+-----------------------------Total |
516644690
336
1537633.01
Adj R-squared =
0.4649
Root MSE
907.11
=
-----------------------------------------------------------------------------Salary |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------homeruns |
10.94588
12.11604
0.90
0.367
-12.88802
34.77977
RBI |
32.44475
3.782402
8.58
0.000
25.00426
39.88525
strikeouts |
-7.928094
2.351847
-3.37
0.001
-12.5545
-3.301692
errors |
-11.21707
9.179658
-1.22
0.223
-29.2747
6.840556
_cons |
246.2292
110.9972
2.22
0.027
27.8828
464.5756
------------------------------------------------------------------------------
Furthermore, we can use Stata’s testparm command to test if the variables that we removed are jointly significant. Run the regression of Salary on all the variables again and click User>Core Statistics>Test Hypothesis, using most recent regression>Joint significance (testparm) or type db testparm. Select the variables that were removed (i.e., Batting_average, On_base_percentage, runs, hits, doubles, triples, walks, and stolenbases) in the “Test coefficients…” field and click OK. Stata reports a p-value of 0.0000, which tells us that the variables that we removed were jointly significant. That is, at least one of them has an effect on salary.
3. B-School Costs . regress Estimatedtotalcost rank enrolled BasesalaryMean BasesalaryMedian Source | SS df MS -------------+-----------------------------Model | 949387589 4 237346897 Residual | 1.1370e+09 25 45480168.6 -------------+-----------------------------Total | 2.0864e+09 29 71944544.9
Number of obs F( 4, 25) Prob > F R-squared Adj R-squared Root MSE
= = = = = =
30 5.22 0.0034 0.4550 0.3678 6743.9
-----------------------------------------------------------------------------Estimatedtotalcost | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------rank | -204.466 310.9075 -0.66 0.517 -844.792 435.8601 enrolled | -.3200936 2.017585 -0.16 0.875 -4.475388 3.835201 BasesalaryMean | 1.372897 .7017824 1.96 0.062 -.0724512 2.818245 BasesalaryMedian| -1.098467 .9058732 -1.21 0.237 -2.964147 .7672141 _cons | 40611.48 45162.06 0.90 0.377 -52401.53 133624.5 ------------------------------------------------------------------------------
No, it does not make sense that an increase in the median base salary should lower the estimated total cost of the program. Successful programs with high salaries for graduates ought to cost more, not less.
While the low p-value tells us that the true coefficient is not significantly different from zero, there is a more likely cause of the negative coefficient which is the presence of mean base salary in this regression. The two are surely highly correlated and including both will cause multicollinearity problems. The high VIFs confirm this theory: . vif Variable |
VIF
1/VIF
-------------+---------------------Basesala~ean |
13.81
0.072435
Basesala~ian |
12.42
0.080519
rank |
4.78
0.209344
enrolled |
1.55
0.644913
-------------+---------------------Mean VIF |
8.14
4. Video Libraries a. The variables population and DVD Library both have small p-values which indicate their significance in this regression. . regress Sales Population Advertising DVDLibrary VHSLibrary
Source |
SS
df
MS
Number of obs =
-------------+------------------------------
F(
4,
29
24) =
10.06
Model |
7525987.87
4
1881496.97
Prob > F
=
0.0001
Residual |
4489017.17
24
187042.382
R-squared
=
0.6264
-------------+-----------------------------Total |
12015005
28
429107.323
Adj R-squared =
0.5641
Root MSE
432.48
=
-----------------------------------------------------------------------------Sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Population |
282.3071
48.23321
5.85
0.000
182.7587
381.8556
Advertising |
.4838488
.4385733
1.10
0.281
-.421322
1.38902
DVDLibrary |
1.34909
.5271474
2.56
0.017
.2611108
2.437069
VHSLibrary |
-.4082301
.2903114
-1.41
0.172
-1.007403
.1909433
_cons |
-61.42029
325.6842
-0.19
0.852
-733.5995
610.7589
------------------------------------------------------------------------------
b & c. based on the regression above, each DVD added to the library increases sales by 1.35 per month. A 95% confidence interval for this variable is: 1.349 ± 2.0639 ∙ 0.527 or [0.26, 2.44]. (Stata automatically reports this confidence interval as [0.2611108, 2.437069].)
d. According to this regression, the estimated increase in sales is indeed greater than $1 per month although there is some variability in this estimate that might make cautious decision makers hesitant. A bigger problem is the choice of this regression for making that decision. The regression suffers from a major flaw as far as this coefficient is concerned: multicollinearity. Stata’s report on the variance inflation factors gives us: . vif Variable |
VIF
1/VIF
-------------+---------------------DVDLibrary |
16.29
0.061386
VHSLibrary |
14.78
0.067674
Advertising |
1.28
0.781362
Population |
1.22
0.817545
-------------+---------------------Mean VIF |
8.39
These high numbers (greater than ten) show that the two library variables are being distorted (probably by their high correlation of 0.96.) Removing the VHS library variable from the regression results in a different coefficient for the DVD variable which is lower than $1 our cutoff for a profitable choice. This new coefficient shows that as the DVD Library increases by one AND the VHS Library increases as is has on average in the past, this combined affect increases sales by only 0.6387. Since a bigger VHS Library tends to increase sales, the affect of the DVD Library variable must be LESS than 0.6387 and thus less than 1. . regress Sales Population Advertising DVDLibrary Source |
SS
df
MS
Number of obs =
-------------+------------------------------
F(
25) =
12.27
Model |
7156141.17
3
2385380.39
Prob > F
=
0.0000
Residual |
4858863.87
25
194354.555
R-squared
=
0.5956
Adj R-squared =
0.5471
Root MSE
440.86
-------------+-----------------------------Total |
12015005
28
429107.323
3,
29
=
-----------------------------------------------------------------------------Sales |
Coef.
Std. Err.
t
P>|t|
[95% Conf. Interval]
-------------+---------------------------------------------------------------Population |
275.531
48.92099
5.63
0.000
174.7763
376.2856
Advertising |
.2908498
.4246079
0.68
0.500
-.5836466
1.165346
DVDLibrary |
.638733
.1535481
4.16
0.000
.3224948
.9549712
_cons |
20.59682
326.622
0.06
0.950
-652.0938
693.2874
------------------------------------------------------------------------------
The stores should NOT update their collections.