Back ] Up ] Next ] [Timeline]

Grade 7:  The Learning Equation Math

41.03 Reading and Interpreting Graphs

4icon.gif (5323 bytes)

Data Analysis

Refresher pp 82-83

Statistics Overview:

1.  Question:  formulate questions that explore whether or not a relationship exists in a real-world context.  The question needs to be clearly stated and based on collectable data.  Not all questions can be answered!

  • "Data as they are" questions can be answered:

If the U.S. Presidential elections were held today, what percent of Americans would vote for Al Gore as president?

  • "What-if questions under replicable circumstances" can be answered:

Among all American school children age 6 to 12, would giving Vitamin C prevent colds? (You can imagine testing this out on more and more children.)

  • Data from nonreplicable events, in general, can NOT be answered!

    "How many U.S. troops be in Bosnia if the American Revoluation had failed?"

source:  http://www.stat.ucla.edu/~tlin/50/lect50.01

2.  Collecting Data:  select, use, and defend appropriate methods of collecting data:

  • designing and using surveys, interviews, experiments, research
- sampling procedures (random/non-random, clustered, stratified)
- data acquisition (survey, observation, self-report, examination of official records)
3.  Displaying Data:  select, use, and defend appropriate methods of displaying data
  • display data by hand or by computer in a variety of ways, including circle graphs

4.  Conclusions:  present analyses and conclusions based on displayed data.  Conclusions are clearly stated and answer the question based on available data.

  • read and interpret graphs that are provided
  • determine and use the most appropriate measure of central tendency in a given context (mode, median, mean)
  • describe the variability of given data using range or box-and-whisker plots (range, extremes, gaps, clusters, and quartiles)
  • interpolate from data to make predictions

 

Learning Outcomes:

The student will:

Use the following weather information to calculate the mean, median, and mode in the chart that follows:

    LOCATION
    CLOUD COVER
    (%)
    TEMPERATURE
    (oF)
    RELATIVE HUMIDITY
    (%)
    Roanoke 75 34 77
    Harrisonburg 90 32 83
    Richmond 90 37 90
    Hampton 55 42 72
    Buckroe 54 48 66

Measure of Central Tendency

Description/Explanation

%CLOUD COVER

TEMPERATURE (oF)

RELATIVE HUMIDITY %
mean the sum of all the results included in the sample divided by the number of observations
(75+90+90+55+44)
5
 
= 354/5
= 70.8%
(34+32+37+42+48)
5
 
= 193/5
= 38.6oF
(77+83+90+72+66)
5
 
= 388/5
= 77.6%
median the middle value of all the numbers in the sample. In other words, the median is the value that divides the set of data in half, 50% of the observations being above (or equal to) it and 50% being below (or equal to) it
  • for an even number of values, the median is the average of the middle two values
  • for an odd number of values, the median is the middle of the all of the values.

44,55,75,90,90

32,34,37,42,48

66,72,77,83,90

mode the most frequently observed value of the measurements in the sample.   There can be more than one mode or no mode. 90 no mode no mode
range the difference between the largest and smallest values of a variable in the sample 90 - 44 = 46 48 - 32 = 16 90 - 66 = 24

Gregory Consulting - Dieroll Java Sample

 Java Bar and Line Graph - may modify this to deal with central tendency

Central Tendency Introduction

Variance

 

Review:

Statistics Vocabulary

statistics - systematic collection and arrangement of large numbers of observations and quantities of numerical observations, and with ways of drawing useful conclusions from such data

population - eligible people for a data collection investigation

sample - part of a population selected so as to give information about the population as a whole

 Biased Samples

Unbiased Samples

convenience sampling - quick and easy way to obtain data, but not everyone in the population has an equal chance of being selected systematic sampling - every nth member of the population is sampled
self selective sampling - population provides information by volunteering their opinions simple random sampling - the sample is chosen randomly from the population
cluster sampling - a particular segment of the population is sampled stratified random sampling - the population is divided into groups (strata)

frequency - the number of times an event occurs

frequency table - a table showing a set of values of a variable and the number of times each value occurs

Age 42 U.S.A. Presidents Began Their Term

Age

Number of Presidents Beginning Their Term

35 - 39  
40 - 44 2
45 - 49 6
50 - 54 12
55 - 59 12
60 - 64 6
65 - 69 4
70 -74  

frequency table data source:  http://www2.sunysuffolk.edu/wrightj/MA22/Stat/Chart.htm

survey - asking people's opinion.  Methods:

  • telephone
  • personal interview
  • by mail

SURVEY SHEET

Chocolate Bar

Talley

Frequency

Butterfinger

| | | | | | | |  | | | |  | | 17

5th Avenue

| | | 3

Baby Ruth

| | | | 5

Snickers

| | | | | | | |  | | | | 15

Nestle

| | | |

5

Hershey

| | | |  | | |

8

Dove

| | | |  | |

7

source:  Take Our Surveys

inference - conclusion

bias - an unwanted influence on a a sample that prevents the sample from being truly representative of the population from which it is drawn.

spreadsheets - check link for a variety of topics

Graphing

In graphs, the axis of the variables are assigned, then scales and labels are added.

graphcw9.gif (3452 bytes)
president.gif (10884 bytes)
stmlef2.gif (3459 bytes)
 
trend1.gif (2917 bytes)
circlegr.gif (4399 bytes)
box6.gif (3061 bytes)
bargraph.gif (4727 bytes)
picto8.gif (5481 bytes)
 

The Dependent variable is always assigned to the Y-AXIS.

What is the dependent variable?

The dependent variable relies on the changes in the indendent variable.

The dependent variable is what we measure.

linegraph3.gif (21562 bytes) axis_scale.gif (47160 bytes)

source:  Line and Bar Graphs

The Independent variable is always assigned to the X-AXIS.

What is the independent variable?    The independent variable does not relying on an other variable.  The values of the independent variable can be chosen freely.

There are three types of relationships between variables:
  • linear
  • non-linear (curved-line or other pattern)
  • no relationship at all

 

axis - a line drawn through the center of a figure

scale - a sequence of marks, usually along a line, used in making measurements

proportional - one variable is proportional to another if the ration of corresponding values remains constant 

interpolation - to estimate a value by following a pattern and staying within the values already known

extrapolation - to estimate a value by following a pattern and going beyond the values already known

discreet variable - have measurements that are distinct, periodic, and unconnected between data points (e.g. the distance an athlete throws a discus)

continuous variable - measurements are uninterrupted and connected between data points (e.g. growth of a plant)

scatter plot - a graph that relates data from two different sets

line of best fit (trend line) - A line on a scatter plot which can be drawn near the points to more clearly show the trend between two sets of data

trend - relationship between two sets of data. The trend will show a positive correlation, a negative correlation, or no correlation. 

positive correlation -both sets of data increase together

negative correlation -one set of data decreases as the other set of data increases

no correlation - the two data sets are not related.

weak correlation - when the data is not clustered along an obvious line

strong correlation - when the data is clustered along an obvious line ( can be positive or negative)

lower extreme - minimum data value

upper extreme - maximum data value

range - upper extreme minus lower extreme

cluster - a particular segment of the population

gaps - spaces in the data set without a segment of the population

outlier - a point separted from the main body of the data

central tendency - point within the range about which the rest of the data is considered balanced.  The three common measures of central tendency are:

  • mode - the most frequently observed value of the measurements in the sample.   There can be more than one mode or no mode.
  • mean - the sum of all the results included in the sample divided by the number of observations
  • median - the middle value of all the numbers in the sample.
  • for an even number of values, the median is the average of the middle two values
  • for an odd number of values, the median is the middle of the all of the values.

lower quartile - separates the first 25% of the distribution from the remaining 75%.

upper quartile - separates the first 75% of the distribution from the remaining 25%.

Picture the Parts

wpe1.jpg (13693 bytes)

 

Example for an Odd Number of Data Items

wpe2.jpg (10034 bytes)

number of data items (N) 15
upper extreme 98
lower extreme 5
range 98 - 5 = 93
median (MED) 56
lower quartile (Q1) 50
upper quartile (Q3) 62

 

Example for an Even Number of Data Items

wpe3.jpg (13110 bytes)

number of data items (N) 16
upper extreme 98
lower extreme 5
range 98 - 5 = 93
median (MED) (50 + 51)/2 = 50.5
lower quartile (Q1) 49
upper quartile (Q3) 82

 

 

 

 

Enrichment:

Statistics Vocabulary

statistics - systematic collection and arrangement of large numbers of observations and quantities of numerical observations, and with ways of drawing useful conclusions from such data

population - eligible people for a data collection investigation

sample - part of a population selected so as to give information about the population as a whole

 Biased Samples

Unbiased Samples

convenience sampling - quick and easy way to obtain data, but not everyone in the population has an equal chance of being selected systematic sampling - every nth member of the population is sampled
self selective sampling - population provides information by volunteering their opinions simple random sampling - the sample is chosen randomly from the population
cluster sampling - a particular segment of the population is sampled stratified random sampling - the population is divided into groups (strata)

frequency - the number of times an event occurs

frequency table - a table showing a set of values of a variable and the number of times each value occurs

Age 42 U.S.A. Presidents Began Their Term

Age

Number of Presidents Beginning Their Term

35 - 39  
40 - 44 2
45 - 49 6
50 - 54 12
55 - 59 12
60 - 64 6
65 - 69 4
70 -74  

frequency table data source:  http://www2.sunysuffolk.edu/wrightj/MA22/Stat/Chart.htm

survey - asking people's opinion.  Methods:

  • telephone
  • personal interview
  • by mail

SURVEY SHEET

Chocolate Bar

Talley

Frequency

Butterfinger

| | | | | | | |  | | | |  | | 17

5th Avenue

| | | 3

Baby Ruth

| | | | 5

Snickers

| | | | | | | |  | | | | 15

Nestle

| | | |

5

Hershey

| | | |  | | |

8

Dove

| | | |  | |

7

source:  Take Our Surveys

inference - conclusion

bias - an unwanted influence on a a sample that prevents the sample from being truly representative of the population from which it is drawn.

spreadsheets - check link for a variety of topics

Graphing

In graphs, the axis of the variables are assigned, then scales and labels are added.

graphcw9.gif (3452 bytes)
president.gif (10884 bytes)
stmlef2.gif (3459 bytes)
 
trend1.gif (2917 bytes)
circlegr.gif (4399 bytes)
box6.gif (3061 bytes)
bargraph.gif (4727 bytes)
picto8.gif (5481 bytes)
 

The Dependent variable is always assigned to the Y-AXIS.

What is the dependent variable?

The dependent variable relies on the changes in the indendent variable.

The dependent variable is what we measure.

linegraph3.gif (21562 bytes) axis_scale.gif (47160 bytes)

source:  Line and Bar Graphs

The Independent variable is always assigned to the X-AXIS.

What is the independent variable?    The independent variable does not relying on an other variable.  The values of the independent variable can be chosen freely.

There are three types of relationships between variables:
  • linear
  • non-linear (curved-line or other pattern)
  • no relationship at all

 

axis - a line drawn through the center of a figure

scale - a sequence of marks, usually along a line, used in making measurements

proportional - one variable is proportional to another if the ration of corresponding values remains constant 

interpolation - to estimate a value by following a pattern and staying within the values already known

extrapolation - to estimate a value by following a pattern and going beyond the values already known

discreet variable - have measurements that are distinct, periodic, and unconnected between data points (e.g. the distance an athlete throws a discus)

continuous variable - measurements are uninterrupted and connected between data points (e.g. growth of a plant)

scatter plot - a graph that relates data from two different sets

line of best fit (trend line) - A line on a scatter plot which can be drawn near the points to more clearly show the trend between two sets of data

trend - relationship between two sets of data. The trend will show a positive correlation, a negative correlation, or no correlation. 

positive correlation -both sets of data increase together

negative correlation -one set of data decreases as the other set of data increases

no correlation - the two data sets are not related.

weak correlation - when the data is not clustered along an obvious line

strong correlation - when the data is clustered along an obvious line ( can be positive or negative)

lower extreme - minimum data value

upper extreme - maximum data value

range - upper extreme minus lower extreme

cluster - a particular segment of the population

gaps - spaces in the data set without a segment of the population

outlier - a point separted from the main body of the data

central tendency - point within the range about which the rest of the data is considered balanced.  The three common measures of central tendency are:

  • mode - the most frequently observed value of the measurements in the sample.   There can be more than one mode or no mode.
  • mean - the sum of all the results included in the sample divided by the number of observations
  • median - the middle value of all the numbers in the sample.
  • for an even number of values, the median is the average of the middle two values
  • for an odd number of values, the median is the middle of the all of the values.

lower quartile - separates the first 25% of the distribution from the remaining 75%.

upper quartile - separates the first 75% of the distribution from the remaining 25%.

Picture the Parts

wpe1.jpg (13693 bytes)

 

Example for an Odd Number of Data Items

wpe2.jpg (10034 bytes)

number of data items (N) 15
upper extreme 98
lower extreme 5
range 98 - 5 = 93
median (MED) 56
lower quartile (Q1) 50
upper quartile (Q3) 62

 

Example for an Even Number of Data Items

wpe3.jpg (13110 bytes)

number of data items (N) 16
upper extreme 98
lower extreme 5
range 98 - 5 = 93
median (MED) (50 + 51)/2 = 50.5
lower quartile (Q1) 49
upper quartile (Q3) 82

 

 

 

 

Key Terms: A-E F-J K-O P-R S-Z

mean, median, mode, sum, range, measure of central tendency, data

Prerequisite Skills:

Graphing Relations:  Grade 7 Lesson 21.02

Measures of Central Tendency:  Grade 7 Lesson 41.04

Distribution of Data:  Grade 7 Lesson 41.05

Graphing Relations:  Grade 7 Lesson 21.02

 

 

Back ] Up ] Next ] [Timeline]

Comments to:  Jim Reed
Started September, 1998. Copyright 1999, 2000

Hit Counter visitors since September 3, 2000