**1. Interpolation and extrapolation**When a regression line is found, predictions can be made by substituting values of the explanatory variable or response variable into the equation. Interpolation is the process of predicting values that are within the domain of the data (between the smallest and largest value of the explanatory variable. This was previously covered in lesson 9D. Extrapolation is the process of predicting values that are outside the domain of the data (smaller than the smallest value or larger than the largest value of the explanatory variable. This was previously covered in lesson 9E. **worked example 9 **( *2 marks*) When producing between 0 and 8000 products, a factory’s costs take the equation *costs* = 4500 + 3 × *products produced* **a) **The factory spent $34 500. How many products did it produce? **b) **Was this prediction made using interpolation or extrapolation? **SolUTIoN****a) **The factory spent $34 500. How many products did it produce? **Step 1 **Substitute $34 500 into the equation 34 500 = 4500 + 3 × *products produced* **Step 2 **Find the number of products produced by the factory 34 500 = 4500 + 3 × *products produced* 30 000 = 3 × *products produced* *products produced *= 30 000 ________ 3 *products produced *= 10 000 **b) **Was this predication made using interpolation or extrapolation? **Step 1 **Is the predicted value within or outside the domain of the data? This regression line is based upon data values between 0 and 8000 products produced. The prediction was that 10 000 products would have been produced if the cost was $34 500. This prediction is outside the domain of the data. **Step 2 **Determine whether the prediction was made using interpolation or extrapolation. Because the predicted value is outside the domain of the data, the prediction was made using extrapolation. **0****20****40****60****80****100***explanatory variable***20****40****60****80****100***response variable***extrapolation****interpolation**LESSON 13E **Predictions and limitations of models**The key skills you will learn in this lesson are. Interpolation and extrapolation. Limitations of regression line predictions VCAA key knowledge point: “use of the model to make predictions and identify limitations of extrapolation” *Mathematics Area of Study key knowledge points derived **from VCE Mathematics Study Design 2016-2020 pi i The Victorian Curriculum and Assessment Authority **(VCAA). Used with permission.*© Edrolo 2018 E PrEdiCtions And limitAtions of modEls 553 **2. limitations of regression line predictions**Predictions based upon regression lines are not always accurate. ● ● A regression line based upon a small sample of data is unreliable as the data can contain biases or lack diversity. The larger a data set is, the less likely it is that these problems will occur. ● ● A regression line based upon data with a Pearson’s correlation coefficient between –0.5 and 0.5 will provide little insight into the relationship between variables, as a prediction based upon a weak trend cannot be reliable. ● ● When extrapolating, predictions may not be accurate as they are outside the domain of data, and therefore are based on a trend that is not proven to continue overall possible values. Extrapolating can provide some level of insight if relatively close to the domain of the data set. **0****10****20****10****20****unreliable****prediction****0****10****20****10****20****unreliable****prediction****0****10****20****10****20****unreliable****prediction****worked example 10 **( *1 mark*) Interpolation was used to predict a point within a data set of 1000 values. The data has a Pearson’s correlation coefficient value of 0.39. Why might this prediction be unreliable? **SolUTIoN****Step 1 **Check for sample size participants is a large sample size, and therefore reliable. **Step 2 **Check for correlation. The value of *r* is 0.39. This is a weak positive correlation and therefore makes the prediction unreliable. **Step 3 **Check that the prediction is within, or close to the domain of the data. This is an example of interpolation, which is reliable. **Step 4 **Summarise. The prediction maybe unreliable due to the weak correlation between the variables. **Questions e ***Predictions and limitations of models***Refresher question****Q1. **What is the domain of the following data set? **150****160****170****180****190****200***height (cm)***50****60****70****80****90****100***weight (kg)*© Edrolo 2018 554 E PrEdiCtions And limitAtions of modEls **1. Interpolation and extrapolation****Q2. **Predict the value of the response variable from the following regression line, given the value of the explanatory variable is four. **Q3. **Predict the value of the explanatory variable from the following regression line, to three decimal places, given the value of the response variable is ten *response variable *= 2.605 + 0.626 × *explanatory variable* **Q4. **An artist collects data on the amount of time, in hours, he spends on each individual artwork, and the artworks individual selling prices, in dollars. He finds the regression line of his data, between 10 and 100 hours, to be *selling price *= 3621 + 62 × * time spent* **a) **Predict the selling price of an artwork that took 35 hours to create, and determine whether this prediction used interpolation or extrapolation. **b) **An artwork sold for $10 000. Determine whether the artwork took over or under 100 hours to create, and also determine whether this prediction used interpolation or extrapolation. **Check your understanding****Q5. **Emilie thinks that her results on school tests, as a percentage, might be linked to the time she spends on Netflix, in hours, the night before the test. She finds that the trend has the regression line, between 0 and 8 hours, of *results *= 97.2 – 4.8 × *hours spent on Netflix* **a) **Predict Emilie’s test result if she spends 4 hours on Netflix the night before a test, and determine whether this prediction used interpolation or extrapolation. **b) **Emilie fell sick and spent 14 hours and 45 minutes on Netflix one night. Predict Emilie’s result on the maths test the next day, and determine whether this prediction used interpolation or extrapolation. **c) **Emilie got her English test back and scored 75.6%. Estimate how many hours and minutes Emilie spent on Netflix the night before the English test, and determine whether this estimation used interpolation or extrapolation. **0****5****10***explanatory variable***5****10***response variable*Skill Skill Application Application **2. limitations of regression line predictions****Q6. **From which of the following regression lines would predictions be most reliable? **a. ****B. ****C. **Skill **0****10****20****10****20****0****10****20****10****20****0****10****20****10****20**© Edrolo 2018 E PrEdiCtions And limitAtions of modEls 555 **Q7. **Explain the limitations of the following predictions: **a) **There should be 86 people at the beach on a day in which the maximum temperature is 44°C. **0****10****20****30****40****50***maximum temperature (°C)***20****40****60****80****100***number of people**at the beach***b) **A clock that has deviated 30 seconds from the correct time should be 36 years old. **0****10****20****30****40****50***age of clock (years)***10****20****30****40****50***deviation from**correct time (s)***c) **A player with $180 boots should score 11 goals. **0****50 100 150 200 250 300 350***price of boots ($)***5****10****15****20****25****30***goals scored***Q8. **The regression line on the right is used to interpolate the number of cups of coffee a person drinks by the hours of sleep they got the night before. Explain whether this prediction would be reliable or not. **Check your understanding****Q9. **The following regression line describes the average time in which headphones last, in days, in relation to their price, in dollars. The data consisted of 500 different headphones, between $10 and $200, and the value of the Pearson’s correlation coefficient of the data was 0.78. *duration of headphone functionality *= 33 + 3 × *price* Are there any limitations to predicting the price of headphones that last two years Explain your answer. Application **0****2****4****6****8****10****12***hours of **sleep***2****4****6****8***cups of coffee***y ****= 3.4421 – 0.235***x***r ****=****–0.299**Application Application © Edrolo 2018 556 E PrEdiCtions And limitAtions of modEls **Joining it all together****Q10. **The following regression line details the number of surfers at two Australian beaches in relation to the speed of the wind, in metres per second. The data was collected everyday of a week during summer, with the speed of the wind varying from 0 ms to 20 m/s. Surfers Paradise *number of surfers* = 3 + 4 × *speed of wind* *r* = 0.82 Bondi Beach *number of surfers *= 1 + 5 × *speed of wind* *r* = 0.79 **a) **Predict the speed of the wind given there are 71 surfers at Surfers Paradise. Determine whether the prediction used interpolation or extrapolation. **b) **A hurricane has winds of 32 ms. Predict how many surfers would endure those conditions at Surfers Paradise and Bondi Beach. Find the difference between the two. **c) **Using the results from part b, explain how extrapolating data can give unreliable predictions. **d) **With reference to the sample diversity of this data, explain how making predictions based upon a small sample size can be unreliable. **Q11. **The following regression line displays the relationship between the box office revenue and budgets of the highest grossing movies, in dollars *box office revenue *= 58 000 000 + 2.6 × *movie budget* The data is based upon 200 movies, with budgets between $20 million and $200 million. The value of the Pearson’s correlation coefficient of the data was 0.43. **a) **Predict the box office revenue of a movie that had a budget of $225 million, and determine whether the prediction used interpolation or extrapolation. **b) **Predict the budget of a movie that has a $348 million box office revenue, to the nearest dollar, and determine whether the prediction used interpolation or extrapolation. **c) **Are there any limitations of this regression line, regarding its ability to interpolate values If so, explain. Application 4 marks Application 3 marks **VCaa question****Q12. **To investigate the difference in life expectancies between residents of Australia and the UK, least squares regression lines were fitted to data from the period between 1975 and The results are shown on the right. The equations of the least squares regression lines areas follows. Australia: *life expectancy *= – 451.7 + 0.2657 × *year* UK *life expectancy* = – 350.4 + 0.2143 × *year* **a) **Use these equations to predict the difference between the life expectancies of Australia and the UK in 2030. Give your answer correct to the nearest year. (2 marks) **b) **Explain why this prediction maybe of limited reliability. (1 mark) *Adapted from VCAA 2015 Exam 2, Core. Q5bi,bii***1975****83****82****81****80****79****78****77****76****75****74****73****72****1980****1985****1990****1995****2000****2005****2010****Australia****UK**3 marks Edrolo 2018 E PrEdiCtions And limitAtions of modEls 557
**Share with your friends:** |