AP Statistics: Chapter 3 Categorical Data
MAKE A PICTURE!
First, create a frequency table
Example: number of students at CB South in each grade:
Proportion = decimal: .30, .05 Percent = %: 30%, 5%
Frequency = # of things (count) Relative frequency = % of things
Distribution (of a variable) shows the values of the variable ad how often the sample takes each value
Examples: bar chart, pie chart, histogram, stemplot, etc.
Categorical Distributions:

Bar Chart
Notice the spaces
In between bars
Relative frequency
%
#
grade
grade

Pie Chart
Be sure to use labels and percents!
grade
Contingency tables (aka 2Way tables)
Frosh
Soph
Junior
Senior
Total
Male
cells
Female
margins
Total
gender
Identify:

Row variable gender

Column variable grade

Values of the variable the different rows/columns

Total (n) bottom right of chart

# of Cells 8 (don’t count totals)

Totals margins
Example: Hospitals
Hospital A
Hospital B
Died
63
2821
79
16
Survived
2037
2900
784
2100 800

What percent of people died?
Notation:
Probability: P(event) Given/Of: And: (overlap) Or:
Probability of A given B

Of those people that went to Hospital A, what percent died?

Given that someone went to Hospital B, what is the chance that they died?

What percent of people died and went to Hospital B?

What percent of people survived or went to Hospital A?
2 types of Distributions for Categorical Variables

MARGINAL DISTRIBUTIONS

How to make: Convert totals into percentages

Example: Hair color vs. Gender


Brown

Blonde

Black

Red

Total

MALE

26

24

10

3

63

FEMALE

20

35

12

6

73

TOTALs

46

59

22

9

136

margins
Find the marginal distribution for the HAIR COLOR variable
Brown:
Blonde:
Black:
Red:

Find the marginal distribution for the GENDER variable
Male:
Female:

CONDITIONAL DISTRIBUTIONS

Then look at … each value of the variable individually

Brown

Blonde

Black

Red

Total

MALE

26

24

10

3

63

FEMALE

20

35

12

6

73

TOTALs

46

59

22

9

136


ALWAYS … in %

Example: Hair Color vs. Gender

Find the conditional Distribution for the HAIR COLOR variable
Brown: Blonde: Black: Red:

Find the conditional Distribution for the GENDER variable
Male: Female:

Represented visually: SEGMENTED (or STACKED) BAR GRAPH
Independence: When one variable does not affect the other variable
How do we tell independence? Independence exists when the conditional distributions looks the same throughout all values of the variable (when the sections look approximately the same). There is generally less than a 5 % difference between percentages. When categorical variables are dependent, they are said to be associated.
Independent: Dependent:
AP Stat worksheet 3A Categorical Variables practice
In a survey of adult Americans, people were asked to indicate their age and to categorize their political preference (liberal, moderate, conservative). The results are as follows:


Liberal

Moderate

Conservative

Total

under 30

83

140

73

296

30  50

119

280

161

560

over 50

88

284

214

586

total

290

704

448

1442


What are the row and column variables?

What percent of Liberals are under 30?

Of those over 50, what percent are Liberals?

Of those that are moderates, what percent are 3050?

What percent of respondents are moderate and under 30?

Calculate the marginal distribution for the AGE variable. Write these down. Then make a bar graph of the marginal distribution for age.

Calculate the marginal distribution for the PREFERENCE variable. Write these down. Then make a bar graph of this marginal distribution.

Calculate the conditional distribution of the AGE variable. Write these down. Then make a segmented bar graph of this marginal distribution.

Calculate the conditional distribution of the PREFERENCE variable. Write these down. Then make a segmented bar graph of this marginal distribution.

Are the two variables independent?
AP Stat worksheet 3B Categorical Variable practice
A 4year study reported in The New York Times, on men more than 70 years old analyzed blood cholesterol and noted how many men with different cholesterol levels suffered nonfatal or fatal heart attacks.

Low cholesterol

Medium cholesterol

High cholesterol

Nonfatal heart attacks

_{29}

_{17}

_{18}

Fatal heart attacks

_{19}

_{20}

_{9}


Calculate the marginal distribution for cholesterol level and make a bar graph.

Calculate the marginal distribution for severity of heart attack and make a bar graph.

Calculate three conditional distributions for the three levels of cholesterol and make a stacked bar graph.

Calculate the conditional distributions for the type of heart attack and make a stacked bar graph.

Are the two variables independent?
