Descriptive statistics provide a summary that quantitatively describes a sample of data.
Population refers to the entire group of individuals that we want to draw conclusions about.
Sample refers to the (usually smaller) group of people for which we have collected data on.
For the examples later, let’s create a population of data in Python:
… and draw a sample from it:
What do the values look like?
The mean, often simply called the average, is defined as the sum of all values divided by the number of values. It’s a measure of central tendency that tells us what’s happening near the middle of the data.
\(\bar{x} = \frac{1}{n} \sum_{i=i}^{n} x_{i}\)
In Python, we use the mean() function from numpy:
The median of a dataset is the middle value when the data is arranged in ascending order, or the average of the two middle values if the dataset has an even number of observations.
In Python, we use the median() function from numpy:
The mode statistic represents the value that appears most frequently in a dataset.
In Python, we use the mode() function from statistics:
The range is the difference between the maximum and minimum values in a dataset.
In Python, we can use the max() and min() function and subtract the values:
Or, we can use the ptp() function from numpy:
The sample variance tells us about how spread out the data is. A lower variance indicates that values tend to be close to the mean, and a higher variance indicates that the values are spread out over a wider range.
\(s^2 = \frac{\Sigma_{i= 1}^{N} (x_i - \bar{x})^2}{n-1}\)
In Python, we use the var() function from numpy:
The sample standard deviation is the square root of the variance. It also tells us about how spread out the data is.
\(s = \sqrt{\frac{\Sigma_{i= 1}^{N} (x_i - \bar{x})^2}{n-1}}\)
In Python, we use the std() function from numpy:
Descriptive statistics provide a summary that quantitatively describes a sample of data.
In Python:
numpy and statistics.data from pydataset using from pydataset import datahousing data set using housing = data('Housing')Remember: you can extract a column in Python using
dataset['column_name'].