Scatter Plots

To demonstrate creating scatter plots we will use the data in the example below.

Example: The data below lists the number of roller coasters in different countries, and the amount each country has contributed to tsunami aid (in millions).

	Roller Coasters	Tsunami Aid
Norway	7	139
Australia	18	193
Netherlands	36	156
United States	624	792
New Zealand	3	37
Canada	51	176
Ireland	2	26
Germany	108	313
Sweden	19	86
Switzerland	3	29

Source: www.nationmaster.com and www.nationmaster.com

Scatter Plots

To create a scatter plot we will begin by creating two lists.

> coasters = c(7, 18, 36, 624, 3, 51, 2, 108, 19, 3)
> aid = c(139, 193, 156, 792, 37, 176, 26, 313, 86, 29)

Using the plot command we can create a scatter plot. The data set we list first will be assigned to the x-axis, and the second data set will be assigned to the y-axis.

> plot(coasters, aid)

If we want to add more information to our plot we can include the main, xlab, and ylab arguments.

> plot(coasters, aid, main = "Roller Coasters vs. Tsunami Aid", xlab = "Number of Roller Coasters", ylab = "Tsunami Aid (in millions)")

Correlation Coefficients

Scatter plots are used to visualize the relationship between two different variables, and perform a visual check for correlations. We can also calculate the strength of a correlation more directly with the cor command.

> cor(coasters, aid)
[1] 0.965223

The result of this calculation is the Pearson correlation coefficient, r. Values of r close to 1 indicate a strong positive correlation while values of r close to -1 indicate a strong negative correlation. In this case r = 0.965223, indicating a strong positive correlation in our sample data.

The method argument can be added to this calculation to calculate Spearman's rank correlation coefficient, or the Kendall rank correlation coefficient.

> cor(coasters, aid, method = "spearman")
[1] 0.8997002
> cor(coasters, aid, method = "kendall")
[1] 0.8090398

In this example we should consider basing any conclusions on the Kendall rank correlation coefficient since both of our data sets have outliers, suggesting they do not follow a normal distribution.

Lathrop - Resources

Pages

Scatter Plots

No comments:

Post a Comment