1Introduction



Download 32.4 Kb.
Date31.05.2016
Size32.4 Kb.



Statistic 302: Project 1




PREDICTING IS AS SIMPLE AS SIMPLE REGRESSION

Jimmy Huang

1Introduction


The Eastwestside Movers, an intracity moving company, has typically used a trained estimator to determine the number of labor hours needed for a move. This has proved useful in the past, but the company would like to be able to develop a more reliable estimate that would be more accurate in predicting the labor hours. In a preliminary effort to provide a more accurate means of estimation, the company has collected data for 36 moves (refer to provided data) in which the origin and destination were within the borough of Manhattan in New York City and the travel time was an insignificant portion of the hours worked. By having these collected data on hand, the company is now asking a statistician to do an analysis and develop a model so that the labor hours can be predicted based on the number of cubic feet to be moved from the apartment of origin.

2Preliminary Analysis


In terms of doing an analysis, a statistical tool, JMPIN, is being used to explore the data and find out the relationship between the number of cubic feet to be moved and the labor hours required.

With the 36 moves, the total labor required is 1,042.5 hours while the total space moved is 22,520 cubic feet. Therefore, it requires approximately 0.046292 hours on average to move one cubic feet.



After scatter plotting all the paired data between cubic feet and labor hours, Figure 1 on the left is the result. It seems that the labor hour is quite proportional to the cubic feet moved. A linear model can be used to fit the points.



Figure 1 Scatter plot of the collected 36 pairs of data


3Fitting Model


From the scatter diagram, it appears that there exists a linear association between the cubic feet moved and the labor hours required. Using JMPIN to conduct a simple regression fit of Hours by Feet (refer to Appendix A for details), we’ve obtained a linear model for the data as:


Hours = -2.36966 + 0.0500803 Feet

with the correlation coefficient r = 0.942998 and r2 = 0.889246. As r2 = 0.889246, it indicates that the fitted model can explain a large proportion of the total variation: approximately 88.9% of the variation in the labor hours is explained by the model.

Now, let’s conduct a hypothesis testing for zero slope (β1 = 0) to verify that a straight-line model in cubic feet is better than a model that does not include cubic feet at all. For the full hypothesis testing procedure, refer to Appendix D. Since we reject the null hypothesis of zero slope for the straight line, the choice of the linear model is convincingly reasonable. More over, the Analysis of Variance section in the JMPIN output (refer to Appendix B) further shows that the null hypothesis of zero slope should be rejected as the p-value < 0.001 < 0.05 (the significance level).
By plotting the residuals of Hours and graphing the histogram for the residuals (refer to Appendix B), we can see no violation to model assumptions.

4Application of the Fitted Model


By obtaining the relationship between the labor hours required and cubic feet to be moved with the linear model:
Hours = -2.36966 + 0.0500803 Feet

the company now can make more reliable predictions on the labor hours easily and accurately. For example, to estimate the labor hours needed to move X0 = 800 cubic feet, all the company needs to do is to do a simple calculation by substitute 800 into the model to get an estimated point as:

Predicted Hours = -2.36966 + 0.0500803 * 800 ≈ 37.69

Then by doing the following calculation, the company can get a 95% prediction interval (PI):

Predicted Hours + b1 * (X0 – Mean Feet) ± tn-2, 1-α/2 * SY|X­ * sqrt(1 + 1/n + (X0 – Meat Feet)2/((n – 1) * SX2))

≈ 37.69 + 0.0500803 * (800 - 625.555556) ± 2.030 * 5.031427 * sqrt(1 + 1/36 + (800 – 625.555556)2/((36 – 1) * 78726.654))

≈ (36.02, 56.84)

Therefore, if the labor hour required to move 800 cubic feet is within 36 hours and 57 hours, then it is normal. If the labor hour is below 36 hours, then the move is more efficient than expected. If the labor hour is above 57 hours, then the company needs to investigate to verify what has been wrong with the move. There might have some other factors that affect the move as it has an extraordinary result.


Appendix A: Bivariate Fit of Hours By Feet




Figure 2 Mean and Regression Fit of Hours By Feet
Table 1: Fit Mean

Mean

28.95833

Std Dev [RMSE]

14.90104

Std Error

2.483507

SSE

7771.438


Table 2: Summary of Fit

RSquare

0.889246

RSquare Adj

0.885988

Root Mean Square Error

5.031427

Mean of Response

28.95833

Observations (or Sum Wgts)

36


Table 3: Analysis of Variance (ANOVA Table)

Source

DF

Sum of Squares

Mean Square

F Ratio

Model

1

6910.7189

6910.72

272.9864

Error

34

860.7186

25.32

Prob > F

C. Total

35

7771.4375




<.0001


Table 4: Parameter Estimates

Term




Estimate

Std Error

t Ratio

Prob>|t|

Intercept




-2.36966

2.073261

-1.14

0.2610

Feet




0.0500803

0.003031

16.52

<.0001


Table 5: Data Summary

N Rows

Sum(Hours)

Sum(Feet)

Mean(Hours)

Mean(Feet)

Std Dev(Hours)

Std Dev(Feet)

36

1042.5

22520

28.9583333

625.555556

14.9010426

280.582704


Appendix B: Residual Plot of Hours




Figure 3 Residual Plot


Figure 4 Distributions Residuals Hours

Appendix C: Hypothesis Testing for Zero Slope: β1 = 0


Testing Procedure:

  1. Assumptions: The variable β1 has a normal distribution, from which a random sample has been selected.

  2. Hypotheses: H0: β1 = 0

HA: β1 ≠ 0

  1. Use 95% significant level: α = 0.05

  2. Test Statistic: T = (b1 – β1) / Sb1, where

    1. Sb1 = SY|X / (SX * sqrt(n – 1))

    2. S2Y|X = (1 / (n – 2)) * ∑(Yi – Ŷi)2

    3. S2X = (1 / (n – 2)) * ∑(Xi – Xi)2

    4. Sample size n = 36

  3. Rejection regions: reject H0 if | T | ≥ tn-2, 1-α/2 = t34, 0.975 ≈ 2.030; do not reject H0 otherwise.

  4. Calculation of T:

From the JMPIN output in Appendix B, we get

b1 = 0.0500803

S2Y|X = 25.32 => SY|X ≈ 5.032

S2X = 78726.654 => SX ≈ 280.583

Sb1 = 5.032 / (280.583 * sqrt(36 – 1)) ≈ 0.00303

T = (0.0500803 – 0) / 0.00303 ≈ 16.528



  1. Since T ≈ 16.528 > t34, 0.975 ≈ 2.030, we reject H0 at significance level 0.05 and conclude that there is evidence that the cubic feet to be moved indeed provides significant information for predicting the labor hours needed, that is, a straight-line model in cubic feet is better than a model that does not include cubic feet at all.





Copyright © 2002 Jimmy Huang June 13, 2002 Page of


Share with your friends:




The database is protected by copyright ©essaydocs.org 2020
send message

    Main page