How to Calculate Kurtosis for a Data Set
Unlock deeper insights into your data. This guide shows you how to calculate kurtosis to analyze distribution shape, peakedness, and tails.
Unlock deeper insights into your data. This guide shows you how to calculate kurtosis to analyze distribution shape, peakedness, and tails.
Kurtosis describes the shape of a data distribution, quantifying its “tailedness” or “peakedness” compared to a normal distribution. It provides insight into a dataset’s characteristics beyond central tendency or variability.
Kurtosis measures the combined weight of a distribution’s tails relative to its center. It indicates the propensity for extreme values, often called outliers. While sometimes confused with peakedness, kurtosis focuses on the extremity of deviations and the probability of outliers in the tails.
There are three categories of kurtosis, each describing a distinct shape relative to a normal distribution. A mesokurtic distribution, like the normal distribution, has an excess kurtosis of zero. Its tails and peak are similar to a standard bell curve, indicating a moderate level of outlier frequency.
A leptokurtic distribution shows positive excess kurtosis, with heavier tails and a sharper peak than a normal distribution. This suggests a higher concentration of data around the mean and a greater likelihood of extreme values. Conversely, a platykurtic distribution exhibits negative excess kurtosis, characterized by lighter tails and a flatter peak. This indicates fewer extreme outliers and a more dispersed set of values.
Calculating kurtosis requires quantitative data. Before applying the formula, compute two fundamental statistical measures: the mean and the standard deviation.
The mean is the arithmetic average of all data points. The standard deviation measures the dispersion of data points around the mean. Both are foundational components of the kurtosis formula, as kurtosis is a standardized fourth moment of the distribution.
Kurtosis is calculated using Fisher’s excess kurtosis formula, which yields zero for a normal distribution. This measure is derived from the fourth central moment of the distribution, standardized by the standard deviation raised to the fourth power. The formula for population excess kurtosis, denoted as $\gamma_2$, is:
$\gamma_2 = \frac{\sum_{i=1}^N (x_i – \mu)^4 / N}{\sigma^4} – 3$
Here, $x_i$ represents each individual data point, $\mu$ is the population mean, $N$ is the total number of data points, and $\sigma$ is the population standard deviation. The “-3” adjustment makes it “excess” kurtosis, setting a normal distribution’s kurtosis to zero.
Consider a small dataset: \[10, 12, 15, 13, 10].
First, calculate the mean ($\mu$): $(10+12+15+13+10) / 5 = 60 / 5 = 12$.
Next, determine the standard deviation ($\sigma$). This requires calculating the variance first.
The deviations from the mean are: $(10-12)=-2$, $(12-12)=0$, $(15-12)=3$, $(13-12)=1$, $(10-12)=-2$.
Squaring these deviations: $(-2)^2=4$, $0^2=0$, $3^2=9$, $1^2=1$, $(-2)^2=4$.
Sum of squared deviations: $4+0+9+1+4 = 18$.
Variance ($\sigma^2$) = $18 / 5 = 3.6$.
Standard deviation ($\sigma$) = $\sqrt{3.6} \approx 1.897$.
Now, proceed with the kurtosis formula.
Calculate $(x_i – \mu)^4$ for each data point:
$(10-12)^4 = (-2)^4 = 16$
$(12-12)^4 = (0)^4 = 0$
$(15-12)^4 = (3)^4 = 81$
$(13-12)^4 = (1)^4 = 1$
$(10-12)^4 = (-2)^4 = 16$
Sum of $(x_i – \mu)^4$: $16+0+81+1+16 = 114$.
Divide by $N$: $114 / 5 = 22.8$.
Standard deviation to the fourth power ($\sigma^4$): $(1.897)^4 \approx 12.96$.
Finally, apply the formula: $\gamma_2 = (22.8 / 12.96) – 3 \approx 1.76 – 3 = -1.24$.
The calculated excess kurtosis for this dataset is approximately -1.24.
The calculated kurtosis value provides insights into your data distribution’s shape, particularly its tails and data concentration around the mean. When using Fisher’s excess kurtosis, a value of zero indicates a mesokurtic distribution, meaning its tail characteristics are comparable to a normal distribution. This suggests a typical frequency of extreme values.
A positive kurtosis value signifies a leptokurtic distribution, with heavier tails and a sharper peak than a normal distribution. This implies a greater likelihood of observing extreme values or outliers.
Conversely, a negative kurtosis value points to a platykurtic distribution, characterized by lighter tails and a flatter peak. This indicates that extreme values are less frequent, and data points are more spread out. Understanding these interpretations is important for assessing risk, especially in financial analysis where tail risk is a significant consideration.
Calculating kurtosis is efficiently performed using software tools, which automate complex mathematical steps. These tools are useful for larger datasets, where manual calculation would be time-consuming and prone to errors. Most statistical software and spreadsheet programs include built-in functions.
In Microsoft Excel, the KURT
function computes the excess kurtosis. Enter =KURT(data_range)
into a cell, where data_range
refers to your numerical data (e.g., =KURT(A1:A100)
).
For programming languages, Python’s scipy.stats
module offers a kurtosis
function. Import the module, then call scipy.stats.kurtosis(data_array)
. In R, the moments
package provides a kurtosis()
function; use kurtosis(your_data_vector)
after installing and loading the package.