Measures of Variation

As with measures of center, some measures of variation can be calculated with a simple command.  For other measures that will not be true.  To demonstrate these methods we will make use of the data in the example below.

Example: Listed below are the number of UFO sightings per month reported on The National UFO Reporting Center Online Database in 2002. 

JanFebMarAprMayJunJulAugSepOctNovDec
484244283281183304597579697360364405
Source: www.nuforc.org

Before we perform any calculations we will need to create a data set in R.

> sightings = c(484, 244, 283, 281, 183, 304, 597, 579, 697, 360, 364, 405)

Range

To calculate the range of a data set we will find the difference between the maximum and minimum values using the max and min commands, respectively.

> max(sightings) - min(sightings)
[1] 514

Standard Deviation

To calculate the sample standard deviation of a data set we will use the command sd.

> sd(sightings)
[1] 158.8301

It is important to remember that this command only calculates the sample standard deviation for any data set.  If we want to calculate a population standard deviation we will need to adjust this calculation manually.

Note that σ = s · (n-1) / n 

Using this formula we can now calculate population standard deviations by incorporating the length command.

> length(sightings)
[1] 12

This command counts the number of data values in our list for us.  To shorten our calculations, we can save this value as a variable.

> n = length(sightings)

Now we can call on the variable n in our calculations to access this number.  We can now proceed with calculating the population standard deviation.

> sd(sightings) * sqrt((n-1) / n)
[1] 152.0682

Variance

To calculate the sample variance of a data set we will use the command var.

> var(sightings)
[1] 25226.99

As with standard deviations, this command only calculates the sample variance for any data set.  Calculating a population variance will require us to manually adjust this calculation.

Note that σ2 = s2 · (n-1)n

Using this formula we can now calculate population variance.

> n = length(sightings)
> var(sightings) * (n-1) / n
[1] 23124.74

Coefficient of Variation

Since the sd command only calculates sample standard deviations, we can only use it to calculate sample coefficients of variation (CV). We can calculate the sample CV as:

Sample CV = sx̄ 

> sd(sightings) / mean(sightings)
[1] 0.3986532

If we want to calculate a population CV we will need to employ the method described above for calculating population standard deviations.

Note that σ = s · (n-1) / n 

To simplify our calculations, let's first save our calculation for σ as the variable pop.sd.

> n = length(sightings)
> pop.sd = sd(sightings) * sqrt((n-1) / n)

We can now calculate the population CV as:

Population CV = σμ

> pop.sd / mean(sightings)
[1] 0.3816814

No comments:

Post a Comment