CHAPTER 7: ANSWER KEY

CHAPTER 7: ANSWER KEY. CASE EXERCISES. 1. Show me even more money! Stata gives us the following coefficients from the regression: . regress Salary Bat...

31 downloads 1112 Views 135KB Size
CHAPTER 7: ANSWER KEY CASE EXERCISES 1. Show me even more money! Stata gives us the following coefficients from the regression: . regress Salary Batting_average-errors Source |

SS

df

MS

Number of obs =

-------------+------------------------------

F( 12,

337

324) =

30.95

Model |

275909639

12

22992469.9

Prob > F

=

0.0000

Residual |

240735051

324

743009.418

R-squared

=

0.5340

Adj R-squared =

0.5168

Root MSE

861.98

-------------+-----------------------------Total |

516644690

336

1537633.01

=

-----------------------------------------------------------------------------Salary |

Coef.

Std. Err.

t

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------Batting_av~e1|

-2322.666

3329.87

-0.70

0.486

-8873.562

4228.23

On_base_pe~e2|

-841.4261

2936.294

-0.29

0.775

-6618.034

4935.182

runs |

4.301694

6.967091

0.62

0.537

-9.404753

18.00814

hits |

7.70872

3.993498

1.93

0.054

-.1477392

15.56518

doubles |

-6.640772

10.58735

-0.63

0.531

-27.46941

14.18787

triples |

-24.85974

26.80728

-0.93

0.354

-77.59805

27.87857

homeruns |

40.01745

15.38914

2.60

0.010

9.742202

70.2927

RBI |

13.66865

6.253432

2.19

0.030

1.366188

25.9711

walks |

7.894205

5.58035

1.41

0.158

-3.084089

18.8725

strikeouts |

-14.09367

2.629301

-5.36

0.000

-19.26632

-8.921011

stolenbases |

10.90001

5.821304

1.87

0.062

-.5523188

22.35233

errors |

-18.10502

9.087294

-1.99

0.047

-35.98257

-.2274732

_cons |

962.0107

405.7485

2.37

0.018

163.7764

1760.245

------------------------------------------------------------------------------

Some of these don’t make any sense: Batting Average, On-base percentage, Doubles, and Triples are all positive things that ought to increase a player’s salary, but each has a negative coefficient. Of course, all of these have high p-values which could explain the results. 1 2

Batting_average. On_base_percentage.

A threshold of α = 0.05 leaves us with 4 variables in the regression below: . regress Salary homeruns RBI strikeouts errors Source |

SS

df

MS

Number of obs =

-------------+------------------------------

F(

4,

337

332) =

73.97

Model |

243461396

4

60865348.9

Prob > F

=

0.0000

Residual |

273183294

332

822841.248

R-squared

=

0.4712

-------------+-----------------------------Total |

516644690

336

1537633.01

Adj R-squared =

0.4649

Root MSE

907.11

=

-----------------------------------------------------------------------------Salary |

Coef.

Std. Err.

t

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------homeruns |

10.94588

12.11604

0.90

0.367

-12.88802

34.77977

RBI |

32.44475

3.782402

8.58

0.000

25.00426

39.88525

strikeouts |

-7.928094

2.351847

-3.37

0.001

-12.5545

-3.301692

errors |

-11.21707

9.179658

-1.22

0.223

-29.2747

6.840556

_cons |

246.2292

110.9972

2.22

0.027

27.8828

464.5756

------------------------------------------------------------------------------

Furthermore, we can use Stata’s testparm command to test if the variables that we removed are jointly significant. Run the regression of Salary on all the variables again and click User>Core Statistics>Test Hypothesis, using most recent regression>Joint significance (testparm) or type db testparm. Select the variables that were removed (i.e., Batting_average, On_base_percentage, runs, hits, doubles, triples, walks, and stolenbases) in the “Test coefficients…” field and click OK. Stata reports a p-value of 0.0000, which tells us that the variables that we removed were jointly significant. That is, at least one of them has an effect on salary.

3. B-School Costs . regress Estimatedtotalcost rank enrolled BasesalaryMean BasesalaryMedian Source | SS df MS -------------+-----------------------------Model | 949387589 4 237346897 Residual | 1.1370e+09 25 45480168.6 -------------+-----------------------------Total | 2.0864e+09 29 71944544.9

Number of obs F( 4, 25) Prob > F R-squared Adj R-squared Root MSE

= = = = = =

30 5.22 0.0034 0.4550 0.3678 6743.9

-----------------------------------------------------------------------------Estimatedtotalcost | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------rank | -204.466 310.9075 -0.66 0.517 -844.792 435.8601 enrolled | -.3200936 2.017585 -0.16 0.875 -4.475388 3.835201 BasesalaryMean | 1.372897 .7017824 1.96 0.062 -.0724512 2.818245 BasesalaryMedian| -1.098467 .9058732 -1.21 0.237 -2.964147 .7672141 _cons | 40611.48 45162.06 0.90 0.377 -52401.53 133624.5 ------------------------------------------------------------------------------

No, it does not make sense that an increase in the median base salary should lower the estimated total cost of the program. Successful programs with high salaries for graduates ought to cost more, not less.

While the low p-value tells us that the true coefficient is not significantly different from zero, there is a more likely cause of the negative coefficient which is the presence of mean base salary in this regression. The two are surely highly correlated and including both will cause multicollinearity problems. The high VIFs confirm this theory: . vif Variable |

VIF

1/VIF

-------------+---------------------Basesala~ean |

13.81

0.072435

Basesala~ian |

12.42

0.080519

rank |

4.78

0.209344

enrolled |

1.55

0.644913

-------------+---------------------Mean VIF |

8.14

4. Video Libraries a. The variables population and DVD Library both have small p-values which indicate their significance in this regression. . regress Sales Population Advertising DVDLibrary VHSLibrary

Source |

SS

df

MS

Number of obs =

-------------+------------------------------

F(

4,

29

24) =

10.06

Model |

7525987.87

4

1881496.97

Prob > F

=

0.0001

Residual |

4489017.17

24

187042.382

R-squared

=

0.6264

-------------+-----------------------------Total |

12015005

28

429107.323

Adj R-squared =

0.5641

Root MSE

432.48

=

-----------------------------------------------------------------------------Sales |

Coef.

Std. Err.

t

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------Population |

282.3071

48.23321

5.85

0.000

182.7587

381.8556

Advertising |

.4838488

.4385733

1.10

0.281

-.421322

1.38902

DVDLibrary |

1.34909

.5271474

2.56

0.017

.2611108

2.437069

VHSLibrary |

-.4082301

.2903114

-1.41

0.172

-1.007403

.1909433

_cons |

-61.42029

325.6842

-0.19

0.852

-733.5995

610.7589

------------------------------------------------------------------------------

b & c. based on the regression above, each DVD added to the library increases sales by 1.35 per month. A 95% confidence interval for this variable is: 1.349 ± 2.0639 ∙ 0.527 or [0.26, 2.44]. (Stata automatically reports this confidence interval as [0.2611108, 2.437069].)

d. According to this regression, the estimated increase in sales is indeed greater than $1 per month although there is some variability in this estimate that might make cautious decision makers hesitant. A bigger problem is the choice of this regression for making that decision. The regression suffers from a major flaw as far as this coefficient is concerned: multicollinearity. Stata’s report on the variance inflation factors gives us: . vif Variable |

VIF

1/VIF

-------------+---------------------DVDLibrary |

16.29

0.061386

VHSLibrary |

14.78

0.067674

Advertising |

1.28

0.781362

Population |

1.22

0.817545

-------------+---------------------Mean VIF |

8.39

These high numbers (greater than ten) show that the two library variables are being distorted (probably by their high correlation of 0.96.) Removing the VHS library variable from the regression results in a different coefficient for the DVD variable which is lower than $1 our cutoff for a profitable choice. This new coefficient shows that as the DVD Library increases by one AND the VHS Library increases as is has on average in the past, this combined affect increases sales by only 0.6387. Since a bigger VHS Library tends to increase sales, the affect of the DVD Library variable must be LESS than 0.6387 and thus less than 1. . regress Sales Population Advertising DVDLibrary Source |

SS

df

MS

Number of obs =

-------------+------------------------------

F(

25) =

12.27

Model |

7156141.17

3

2385380.39

Prob > F

=

0.0000

Residual |

4858863.87

25

194354.555

R-squared

=

0.5956

Adj R-squared =

0.5471

Root MSE

440.86

-------------+-----------------------------Total |

12015005

28

429107.323

3,

29

=

-----------------------------------------------------------------------------Sales |

Coef.

Std. Err.

t

P>|t|

[95% Conf. Interval]

-------------+---------------------------------------------------------------Population |

275.531

48.92099

5.63

0.000

174.7763

376.2856

Advertising |

.2908498

.4246079

0.68

0.500

-.5836466

1.165346

DVDLibrary |

.638733

.1535481

4.16

0.000

.3224948

.9549712

_cons |

20.59682

326.622

0.06

0.950

-652.0938

693.2874

------------------------------------------------------------------------------

The stores should NOT update their collections.