1. Interpolation and extrapolation



Download 0.76 Mb.
View original pdf
Date17.09.2020
Size0.76 Mb.

1. Interpolation and extrapolation
When a regression line is found, predictions can be made by substituting values of the explanatory variable or response variable into the equation.
Interpolation is the process of predicting values that are within the domain of the data (between the smallest and largest value of the explanatory variable. This was previously covered in lesson 9D.
Extrapolation is the process of predicting values that are outside the domain of the data (smaller than the smallest value or larger than the largest value of the explanatory variable. This was previously covered in lesson 9E.
worked example 9
(2 marks)
When producing between 0 and 8000 products, a factory’s costs take the equation costs = 4500 + 3 × products produced
a)
The factory spent $34 500. How many products did it produce?
b)
Was this prediction made using interpolation or extrapolation?
SolUTIoN
a)
The factory spent $34 500. How many products did it produce?
Step 1
Substitute $34 500 into the equation 34 500 = 4500 + 3 × products produced
Step 2
Find the number of products produced by the factory 34 500 = 4500 + 3 × products produced
30 000 = 3 × products produced
products produced =
30 000
________
3
products produced = 10 000
b)
Was this predication made using interpolation or extrapolation?
Step 1
Is the predicted value within or outside the domain of the data?
This regression line is based upon data values between 0 and 8000 products produced. The prediction was that 10 000 products would have been produced if the cost was $34 500. This prediction is outside the domain of the data.
Step 2
Determine whether the prediction was made using interpolation or extrapolation.
Because the predicted value is outside the domain of the data, the prediction was made using extrapolation.
0
20
40
60
80
100
explanatory variable
20
40
60
80
100
response variable
extrapolation
interpolation
LESSON 13E
Predictions and limitations of models
The key skills you will learn in this lesson are. Interpolation and extrapolation. Limitations of regression line predictions
VCAA key knowledge point:
“use of the model to make predictions and identify limitations of extrapolation”
Mathematics Area of Study key knowledge points derived
from VCE Mathematics Study Design 2016-2020 pi i The Victorian Curriculum and Assessment Authority
(VCAA). Used with permission.
© Edrolo 2018 E PrEdiCtions And limitAtions of modEls 553

2. limitations of regression line predictions
Predictions based upon regression lines are not always accurate.


A regression line based upon a small sample of data is unreliable as the data can contain biases or lack diversity. The larger a data set is, the less likely it is that these problems will occur.


A regression line based upon data with a Pearson’s correlation coefficient between –0.5 and 0.5 will provide little insight into the relationship between variables, as a prediction based upon a weak trend cannot be reliable.


When extrapolating, predictions may not be accurate as they are outside the domain of data, and therefore are based on a trend that is not proven to continue overall possible values. Extrapolating can provide some level of insight if relatively close to the domain of the data set.
0
10
20
10
20
unreliable
prediction
0
10
20
10
20
unreliable
prediction
0
10
20
10
20
unreliable
prediction
worked example 10
(1 mark)
Interpolation was used to predict a point within a data set of 1000 values. The data has a
Pearson’s correlation coefficient value of 0.39. Why might this prediction be unreliable?
SolUTIoN
Step 1
Check for sample size participants is a large sample size, and therefore reliable.
Step 2
Check for correlation.
The value of r is 0.39. This is a weak positive correlation and therefore makes the prediction unreliable.
Step 3
Check that the prediction is within, or close to the domain of the data.
This is an example of interpolation, which is reliable.
Step 4
Summarise.
The prediction maybe unreliable due to the weak correlation between the variables.
Questions e Predictions and limitations of models
Refresher question
Q1.
What is the domain of the following data set?
150
160
170
180
190
200
height (cm)
50
60
70
80
90
100
weight (kg)
© Edrolo 2018 554 E PrEdiCtions And limitAtions of modEls

1. Interpolation and extrapolation
Q2.
Predict the value of the response variable from the following regression line, given the value of the explanatory variable is four.
Q3.
Predict the value of the explanatory variable from the following regression line, to three decimal places, given the value of the response variable is ten response variable = 2.605 + 0.626 × explanatory variable
Q4.
An artist collects data on the amount of time, in hours, he spends on each individual artwork, and the artworks individual selling prices, in dollars. He finds the regression line of his data, between 10 and
100 hours, to be selling price = 3621 + 62 × time spent
a)
Predict the selling price of an artwork that took 35 hours to create, and determine whether this prediction used interpolation or extrapolation.
b)
An artwork sold for $10 000. Determine whether the artwork took over or under 100 hours to create, and also determine whether this prediction used interpolation or extrapolation.
Check your understanding
Q5.
Emilie thinks that her results on school tests, as a percentage, might be linked to the time she spends on Netflix, in hours, the night before the test. She finds that the trend has the regression line, between 0 and 8 hours, of results = 97.2 – 4.8 × hours spent on Netflix
a)
Predict Emilie’s test result if she spends 4 hours on Netflix the night before a test, and determine whether this prediction used interpolation or extrapolation.
b)
Emilie fell sick and spent 14 hours and 45 minutes on Netflix one night. Predict Emilie’s result on the maths test the next day, and determine whether this prediction used interpolation or extrapolation.
c)
Emilie got her English test back and scored 75.6%. Estimate how many hours and minutes Emilie spent on Netflix the night before the English test, and determine whether this estimation used interpolation or extrapolation.
0
5
10
explanatory variable
5
10
response variable
Skill
Skill
Application
Application
2. limitations of regression line predictions
Q6.
From which of the following regression lines would predictions be most reliable?
a.
B.
C.
Skill
0
10
20
10
20
0
10
20
10
20
0
10
20
10
20
© Edrolo 2018 E PrEdiCtions And limitAtions of modEls 555

Q7.
Explain the limitations of the following predictions:
a)
There should be 86 people at the beach on a day in which the maximum temperature is 44°C.
0
10
20
30
40
50
maximum temperature (°C)
20
40
60
80
100
number of people
at the beach
b)
A clock that has deviated 30 seconds from the correct time should be 36 years old.
0
10
20
30
40
50
age of clock (years)
10
20
30
40
50
deviation from
correct time (s)
c)
A player with $180 boots should score 11 goals.
0
50 100 150 200 250 300 350
price of boots ($)
5
10
15
20
25
30
goals scored
Q8.
The regression line on the right is used to interpolate the number of cups of coffee a person drinks by the hours of sleep they got the night before. Explain whether this prediction would be reliable or not.
Check your understanding
Q9.
The following regression line describes the average time in which headphones last, in days, in relation to their price, in dollars. The data consisted of 500 different headphones, between $10 and $200, and the value of the Pearson’s correlation coefficient of the data was 0.78.
duration of headphone functionality = 33 + 3 × price Are there any limitations to predicting the price of headphones that last two years Explain your answer.
Application
0
2
4
6
8
10
12
hours of
sleep
2
4
6
8
cups of coffee
y = 3.4421 – 0.235x
r =
–0.299
Application
Application
© Edrolo 2018 556 E PrEdiCtions And limitAtions of modEls

Joining it all together
Q10.
The following regression line details the number of surfers at two Australian beaches in relation to the speed of the wind, in metres per second. The data was collected everyday of a week during summer, with the speed of the wind varying from 0 ms to 20 m/s.
Surfers Paradise
number of surfers = 3 + 4 × speed of wind
r = 0.82
Bondi Beach
number of surfers = 1 + 5 × speed of wind
r = 0.79
a)
Predict the speed of the wind given there are 71 surfers at Surfers Paradise. Determine whether the prediction used interpolation or extrapolation.
b)
A hurricane has winds of 32 ms. Predict how many surfers would endure those conditions at Surfers Paradise and Bondi Beach. Find the difference between the two.
c)
Using the results from part b, explain how extrapolating data can give unreliable predictions.
d)
With reference to the sample diversity of this data, explain how making predictions based upon a small sample size can be unreliable.
Q11.
The following regression line displays the relationship between the box office revenue and budgets of the highest grossing movies, in dollars box office revenue = 58 000 000 + 2.6 × movie budget The data is based upon 200 movies, with budgets between $20 million and $200 million. The value of the
Pearson’s correlation coefficient of the data was 0.43.
a)
Predict the box office revenue of a movie that had a budget of $225 million, and determine whether the prediction used interpolation or extrapolation.
b)
Predict the budget of a movie that has a $348 million box office revenue, to the nearest dollar, and determine whether the prediction used interpolation or extrapolation.
c)
Are there any limitations of this regression line, regarding its ability to interpolate values If so, explain.
Application
4 marks
Application
3 marks
VCaa question
Q12.
To investigate the difference in life expectancies between residents of Australia and the UK, least squares regression lines were fitted to data from the period between
1975 and The results are shown on the right.
The equations of the least squares regression lines areas follows.
Australia:
life expectancy = – 451.7 + 0.2657 × year UK
life expectancy = – 350.4 + 0.2143 × year
a)
Use these equations to predict the difference between the life expectancies of Australia and the UK in
2030. Give your answer correct to the nearest year.
(2 marks)
b)
Explain why this prediction maybe of limited reliability.
(1 mark)
Adapted from VCAA 2015 Exam 2, Core. Q5bi,bii
1975
83
82
81
80
79
78
77
76
75
74
73
72
1980
1985
1990
1995
2000
2005
2010
Australia
UK
3 marks Edrolo 2018 E PrEdiCtions And limitAtions of modEls 557


Share with your friends:




The database is protected by copyright ©essaydocs.org 2020
send message

    Main page