The Eastwestside Movers, an intracity moving company, has typically used a trained estimator to determine the number of labor hours needed for a move. This has proved useful in the past, but the company would like to be able to develop a more reliable estimate that would be more accurate in predicting the labor hours. In a preliminary effort to provide a more accurate means of estimation, the company has collected data for 36 moves (refer to provided data) in which the origin and destination were within the borough of Manhattan in New York City and the travel time was an insignificant portion of the hours worked. By having these collected data on hand, the company is now asking a statistician to do an analysis and develop a model so that the labor hours can be predicted based on the number of cubic feet to be moved from the apartment of origin.
In terms of doing an analysis, a statistical tool, JMPIN, is being used to explore the data and find out the relationship between the number of cubic feet to be moved and the labor hours required.
With the 36 moves, the total labor required is 1,042.5 hours while the total space moved is 22,520 cubic feet. Therefore, it requires approximately 0.046292 hours on average to move one cubic feet.
After scatter plotting all the paired data between cubic feet and labor hours, Figure 1 on the left is the result. It seems that the labor hour is quite proportional to the cubic feet moved. A linear model can be used to fit the points.
Figure 1 Scatter plot of the collected 36 pairs of data
From the scatter diagram, it appears that there exists a linear association between the cubic feet moved and the labor hours required. Using JMPIN to conduct a simple regression fit of Hours by Feet (refer to Appendix A for details), we’ve obtained a linear model for the data as:
Hours = -2.36966 + 0.0500803 Feet
with the correlation coefficient r = 0.942998 and r2 = 0.889246. As r2 = 0.889246, it indicates that the fitted model can explain a large proportion of the total variation: approximately 88.9% of the variation in the labor hours is explained by the model.
Now, let’s conduct a hypothesis testing for zero slope (β1 = 0) to verify that a straight-line model in cubic feet is better than a model that does not include cubic feet at all. For the full hypothesis testing procedure, refer to Appendix D. Since we reject the null hypothesis of zero slope for the straight line, the choice of the linear model is convincingly reasonable. More over, the Analysis of Variance section in the JMPIN output (refer to Appendix B) further shows that the null hypothesis of zero slope should be rejected as the p-value < 0.001 < 0.05 (the significance level).
By plotting the residuals of Hours and graphing the histogram for the residuals (refer to Appendix B), we can see no violation to model assumptions.
4Application of the Fitted Model
By obtaining the relationship between the labor hours required and cubic feet to be moved with the linear model:
Hours = -2.36966 + 0.0500803 Feet
the company now can make more reliable predictions on the labor hours easily and accurately. For example, to estimate the labor hours needed to move X0 = 800 cubic feet, all the company needs to do is to do a simple calculation by substitute 800 into the model to get an estimated point as:
Therefore, if the labor hour required to move 800 cubic feet is within 36 hours and 57 hours, then it is normal. If the labor hour is below 36 hours, then the move is more efficient than expected. If the labor hour is above 57 hours, then the company needs to investigate to verify what has been wrong with the move. There might have some other factors that affect the move as it has an extraordinary result.
Appendix A: Bivariate Fit of Hours By Feet
Figure 2 Mean and Regression Fit of Hours By Feet Table 1: Fit Mean
Appendix C: Hypothesis Testing for Zero Slope: β1 = 0
Assumptions: The variable β1 has a normal distribution, from which a random sample has been selected.
Hypotheses: H0: β1 = 0
HA: β1 ≠ 0
Use 95% significant level: α = 0.05
Test Statistic: T = (b1 – β1) / Sb1, where
Sb1 = SY|X / (SX * sqrt(n – 1))
S2Y|X = (1 / (n – 2)) * ∑(Yi – Ŷi)2
S2X = (1 / (n – 2)) * ∑(Xi – Xi)2
Sample size n = 36
Rejection regions: reject H0 if | T | ≥ tn-2, 1-α/2 = t34, 0.975 ≈ 2.030; do not reject H0 otherwise.
Calculation of T:
From the JMPIN output in Appendix B, we get
b1 = 0.0500803
S2Y|X = 25.32 => SY|X ≈ 5.032
S2X = 78726.654 => SX ≈ 280.583
Sb1 = 5.032 / (280.583 * sqrt(36 – 1)) ≈ 0.00303
T = (0.0500803 – 0) / 0.00303 ≈ 16.528
Since T ≈ 16.528 > t34, 0.975 ≈ 2.030, we reject H0 at significance level 0.05 and conclude that there is evidence that the cubic feet to be moved indeed provides significant information for predicting the labor hours needed, that is, a straight-line model in cubic feet is better than a model that does not include cubic feet at all.