Back ] Up ] Next ] [Timeline]

Grade 8:  The Learning Equation Math

41.02 Collecting Data

 Data Analysis Refresher pp 78-79

Statistics Overview:

1.  Question:  formulate questions that explore whether or not a relationship exists in a real-world context.  The question needs to be clearly stated and based on collectable data.  Not all questions can be answered!

 "Data as they are" questions can be answered: If the U.S. Presidential elections were held today, what percent of Americans would vote for Al Gore as president? "What-if questions under replicable circumstances" can be answered: Among all American school children age 6 to 12, would giving Vitamin C prevent colds? (You can imagine testing this out on more and more children.) Data from nonreplicable events, in general, can NOT be answered! "How many U.S. troops be in Bosnia if the American Revoluation had failed?" source:  http://www.stat.ucla.edu/~tlin/50/lect50.01

2.  Collecting Data:  select, use, and defend appropriate methods of collecting data:

• designing and using surveys, interviews, experiments, research (using a range of media)
- sampling procedures (random/non-random, clustered, stratified)
- data acquisition (survey, observation, self-report, examination of official records)
- design of evaluation (pretest, post-test only) 1
- data collection schedule (fixed intervals between pre and posttest, at the beginning and end of participation, after each session,...)  2
3.  Displaying Data:  select, use, and defend appropriate methods of displaying data
• display data by hand or by computer in a variety of ways, including circle graphs

4.  Conclusions:  present analyses and conclusions based on displayed data.  Conclusions are clearly stated and answer the question based on available data.

• read and interpret graphs that are provided
• determine and use the most appropriate measure of central tendency in a given context (mode, median, mean)
• describe the variability of given data using range or box-and-whisker plots (range, extremes, gaps, clusters, and quartiles)
• analyse sets of data by comparing different measures of central tendency (mode, median, mean)
• interpolate from data to make predictions
• identify bias in data presentations
1 Not included in junior high math..
2 Not included in junior high math.

Learning Outcomes:

The student will:

Read the Statistics Overview before proceding.    Several data samples are included in the Vocabulary Review for you to pose questions for investigation.

Illustrative Example

How much household garbage is produced in our homes?   In the average home in Canada?

Design a questionnaire to investigate this problem. Justify your questions. Explain how you will carry out this survey. Could you collect data via computer networking? How can you use a computer to record, organize, and display your data? How can you display your data to have the most impact?

source:  BC Education - APPENDIX G: Illustrative Examples - Introductory Mathematics 8

Collect Data Issues

appropriate language - surveys need to be delivered in the appropriate language and reports need to be written in the same language.  One must also use language that will not offend persons providing the data.  Some suggestions:

 USE: disabled people / people with disabilities / people with impairments DO NOT USE: the disabled / the handicapped / invalid (means not valid) USE: blind people / people who are blind / people with a visual impairment DO NOT USE: the blind USE: deaf people / people who are deaf / hearing impairment / hard of hearing DO NOT USE: the deaf USE: a person who is unable to speak, having a speech impairment / deaf without speech / profoundly deaf. DO NOT USE: dumb / muteUSE: person with a speech impairment DO NOT USE: speech problem / can't talk properly USE: wheelchair user DO NOT USE: wheelchair-bound / confined to a wheelchair USE: a person who has epilepsy DO NOT USE: an epileptic USE: a person with spina bifida DO NOT USE: spina bifida case Some groups and individuals may express a preference. source:  CG 3.1 Appropriate language guidelines

ethics -  the rules of moral conduct governing an individual or a group (WordCentral.Com - Dictionary)

 Researcher qualifications - - competence, perspective and character of researcher Vulnerable populations - - dissimilarity with researcher may expose individuals to risk because of researcher’s lack of knowledge. Good approach to have members of the group help design the study. Conflict of interest - - choice of data collection instrument or intervention has financial implications for the researcher. Neglecting important topics - - -generalizing findings too far, ignoring ethnic differences, ignoring areas where research on the topic is relevant and needed.source:  Collecting Data

cost - collecting data can be expensive.  Choose a method that meets with your budget.

privacy - being out of the sight and hearing of other people (WordCentral.Com - Dictionary)

 Subjects have a right to privacy and confidentiality. They should be told who has access to the data. Every effort should be made to prohibit unauthorized access to the data; a good rule - minimize the number of individuals who know the identity of the participants. In general, research data do not have privileged status. Confidentiality should be maintained in publications/presentations (do not use the names of individuals, locations, etc.). There are some situations when the participants may want to be identified. Ways to enhance confidentiality: ask for anonymous information, use third parties to select sample and collect data, use a detachable identifier, have subjects make up a code when matched data is required, dispose of sensitive data after study is completed.source:  Collecting Data

cultural sensitivity - considerations include race (white, black, Asian, Native American, Eskimo, Pacific Islander....), ethnicity (Hispanic, Italian, Mexican, Cuban, Puerto Rican, Central/South American, ...), language (English, French, Chinese, Italilan, ...), and religion (Buddhism, Islam, Judaism, Sikhism, Alternative Spirituality, Christianity, Canadian First Nations: Religions ...)

consistency - surveys may be administered in many countries.  Care must be taken to ensure the survey is the same in each case.  Different versions of the survey will make data analysis much more difficult.

Statistics Vocabulary

statistics - systematic collection and arrangement of large numbers of observations and quantities of numerical observations, and with ways of drawing useful conclusions from such data

population - eligible people for a data collection investigation

sample - part of a population selected so as to give information about the population as a whole

 Biased Samples Unbiased Samples convenience sampling - quick and easy way to obtain data, but not everyone in the population has an equal chance of being selected systematic sampling - every nth member of the population is sampled self selective sampling - population provides information by volunteering their opinions simple random sampling - the sample is chosen randomly from the population cluster sampling - a particular segment of the population is sampled stratified random sampling - the population is divided into groups (strata)

frequency - the number of times an event occurs

frequency table - a table showing a set of values of a variable and the number of times each value occurs

Age 42 U.S.A. Presidents Began Their Term

 Age Number of Presidents Beginning Their Term 35 - 39 40 - 44 2 45 - 49 6 50 - 54 12 55 - 59 12 60 - 64 6 65 - 69 4 70 -74

frequency table data source:  http://www2.sunysuffolk.edu/wrightj/MA22/Stat/Chart.htm

survey - asking people's opinion.  Methods:

• telephone
• personal interview
• by mail

 SURVEY SHEET Chocolate Bar Talley Frequency Butterfinger | | | | | | | |  | | | |  | | 17 5th Avenue | | | 3 Baby Ruth | | | | 5 Snickers | | | | | | | |  | | | | 15 Nestle | | | | 5 Hershey | | | |  | | | 8 Dove | | | |  | | 7 source:  Take Our Surveys

inference - conclusion

bias - an unwanted influence on a a sample that prevents the sample from being truly representative of the population from which it is drawn.

Graphing

In graphs, the axis of the variables are assigned, then scales and labels are added.

histogram

The Dependent variable is always assigned to the Y-AXIS.

What is the dependent variable?

The dependent variable relies on the changes in the indendent variable.

The dependent variable is what we measure.

source:  Line and Bar Graphs

The Independent variable is always assigned to the X-AXIS.

What is the independent variable?    The independent variable does not relying on an other variable.  The values of the independent variable can be chosen freely.

There are three types of relationships between variables:
• linear
• non-linear (curved-line or other pattern)
• no relationship at all

axis - a line drawn through the center of a figure

scale - a sequence of marks, usually along a line, used in making measurements

proportional - one variable is proportional to another if the ration of corresponding values remains constant

interpolation - to estimate a value by following a pattern and staying within the values already known

extrapolation - to estimate a value by following a pattern and going beyond the values already known

discreet variable - have measurements that are distinct, periodic, and unconnected between data points (e.g. the distance an athlete throws a discus)

continuous variable - measurements are uninterrupted and connected between data points (e.g. growth of a plant)

scatter plot - a graph that relates data from two different sets

line of best fit (trend line) - A line on a scatter plot which can be drawn near the points to more clearly show the trend between two sets of data

trend - relationship between two sets of data. The trend will show a positive correlation, a negative correlation, or no correlation.

positive correlation -both sets of data increase together

negative correlation -one set of data decreases as the other set of data increases

no correlation - the two data sets are not related.

weak correlation - when the data is not clustered along an obvious line

strong correlation - when the data is clustered along an obvious line ( can be positive or negative)

lower extreme - minimum data value

upper extreme - maximum data value

range - upper extreme minus lower extreme

cluster - a particular segment of the population

gaps - spaces in the data set without a segment of the population

outlier - a point separted from the main body of the data

central tendency - point within the range about which the rest of the data is considered balanced.  The three common measures of central tendency are mean, median and mode.

lower quartile - separates the first 25% of the distribution from the remaining 75%.

upper quartile - separates the first 75% of the distribution from the remaining 25%.

Picture the Parts

Example for an Odd Number of Data Items

 number of data items (N) 15 upper extreme 98 lower extreme 5 range 98 - 5 = 93 median (MED) 56 lower quartile (Q1) 50 upper quartile (Q3) 62

Example for an Even Number of Data Items

 number of data items (N) 16 upper extreme 98 lower extreme 5 range 98 - 5 = 93 median (MED) (50 + 51)/2 = 50.5 lower quartile (Q1) 49 upper quartile (Q3) 82

Enrichment:

Extensive reports on data sets can be examined at:

Key Terms:

survey, bias

Prerequisite Skills:

Collecting Data:   Grade 7 Lesson 41.01

Collecting Data:   Grade 7 Lesson 41.01

Statistics Every Writer Should Know - Data Analysis

Statistics Every Writer Should Know - Sample Sizes

Back ] Up ] Next ] [Timeline]