Scatter Plots

To demonstrate creating scatter plots we will use the data in the example below.

Example: The data below lists the number of roller coasters in different countries, and the amount each country has contributed to tsunami aid (in millions).


Roller CoastersTsunami Aid
Norway7139
Australia18193
Netherlands36156
United States624792
New Zealand337
Canada51176
Ireland226
Germany108313
Sweden1986
Switzerland329
Source: www.nationmaster.com and www.nationmaster.com

Scatter Plots

To create a scatter plot we will begin by creating two lists.

> coasters = c(7, 18, 36, 624, 3, 51, 2, 108, 19, 3)
> aid = c(139, 193, 156, 792, 37, 176, 26, 313, 86, 29)

Using the plot command we can create a scatter plot.  The data set we list first will be assigned to the x-axis, and the second data set will be assigned to the y-axis.

> plot(coasters, aid) 

If we want to add more information to our plot we can include the main, xlab, and ylab arguments.

> plot(coasters, aid, main = "Roller Coasters vs. Tsunami Aid", xlab = "Number of Roller Coasters", ylab = "Tsunami Aid (in millions)")


Scatter plots are used to visualize the relationship between two different variables, and perform a visual check for correlations.  We can also calculate the strength of a correlation more directly with the cor command.

> cor(coasters, aid)
[1] 0.965223

The result of this calculation is the Pearson correlation coefficient, r.  Values of r close to 1 indicate a strong positive correlation while values of r close to -1 indicate a strong negative correlation.  In this case r = 0.965223, indicating a strong positive correlation in our sample data.

The method argument can be added to this calculation to calculate Spearman's rank correlation coefficient, or the Kendall rank correlation coefficient.

> cor(coasters, aid, method = "spearman")
[1] 0.8997002
> cor(coasters, aid, method = "kendall")
[1] 0.8090398

In this example we should consider basing any conclusions on the Kendall rank correlation coefficient since both of our data sets have outliers, suggesting they do not follow a normal distribution.

No comments:

Post a Comment