Statistics is the science of collecting, organizing, analyzing, and interpreting data in order to make decisions.
Data consists of information coming from observations, counts, measurements, or responses.
A population is the collection of all outcomes, responses, measurements, or counts that are of interest.
A sample is a subset of the population.
A parameter is a numerical description of a population characteristic.
A statistic is a numerical description of a sample characteristic.
Descriptive statistics is the branch of statistics that involves the organization, summarization, and display of data.
Inferential statistics is the branch of statistics that involves using a sample to draw conclusions about a population. A basic tool in the study of inferential statistics is probability.
Types of data:
Qualitative data consist of attributes, labels, or nonnumerical entries.
Quantitative data consist of numerical measurements or counts.
Levels of measurement:
Nominal: categorized using names, labels, or qualities.
Ordinal: can be arranged in order or ranked.
Interval: can be ordered and meaningful differences between entries can be calculated.
Ratio: similar to interval, but there is a zero entry that is an inherent zero (implies none).
The mean of a data set is the sum of the data entries divided by the number of entries.
Population mean: \[\mu = \frac{\sum x}{N}\]
Sample mean: \[\bar{x} = \frac{\sum x}{n}\]
The median of a data set is the value that lies in the middle of the data when the data is in sorted order.
The mode of a data set is the data entry that occurs with the greatest frequency.
An outlier is a data entry that is far removed from the other entries in the data set.
A weighted mean is the mean of a data set whose entries have varying weights. A weighted mean is given by: \[\bar{x} = \frac{\sum x \cdot w}{\sum w}\] where \(w\) is the weight of each entry \(x\).
The range of a data set is the difference between the maximum and minimum data entries in the set.
The deviation of an entry \(x\) in a population data set is the difference between the entry and the mean \(\mu\) of the data set. \[\text{Deviation of x} = x - \mu\]
The population variance of a population data set of \(N\) entries is \[\text{Population variance} = \sigma^2 = \frac{\sum (x - \mu)^2}{N}\] where the symbol \(\sigma\) is a lowercase Greek letter Sigma.
1. Find the mean of the population data set. | \(\mu = \frac{\sum x}{N}\) |
2. Find the devation of each entry. | \(x - \mu\) |
3. Square each deviation. | \((x - \mu)^2\) |
4. Add to get the sum of squares | \(SS_x = \sum (x - \mu)^2\) |
5. Divide by \(N\) to get the population variance. | \(\sigma^2 = \frac{\sum (x - \mu)^2}{N}\) |
6. Find the square root of the variance to get | |
the population standard deviation. | \(\sigma = \sqrt{\frac{\sum (x - \mu)^2}{N}}\) |
Population | Sample | |
---|---|---|
Variance | \(\sigma^2\) | \(s^2\) |
Standard deviation | \(\sigma\) | \(s\) |
Mean | \(\mu\) | \(\bar{x}\) |
Number of entries | \(N\) | \(n\) |
Deviation | \(x - \mu\) | \(x - \bar{x}\) |
Sum of squares | \(\sum (x - \mu)^2\) | \(\sum (x - \bar{x})^2\) |