Example: The data below lists the number of roller coasters in different countries, and the amount each country has contributed to tsunami aid (in millions).
Roller Coasters | Tsunami Aid | |
Norway | 7 | 139 |
Australia | 18 | 193 |
Netherlands | 36 | 156 |
United States | 624 | 792 |
New Zealand | 3 | 37 |
Canada | 51 | 176 |
Ireland | 2 | 26 |
Germany | 108 | 313 |
Sweden | 19 | 86 |
Switzerland | 3 | 29 |
Scatter Plots
To create a scatter plot we will begin by creating two lists.
> coasters = c(7, 18, 36, 624, 3, 51, 2, 108, 19, 3)
> aid = c(139, 193, 156, 792, 37, 176, 26, 313, 86, 29)
Using the plot command we can create a scatter plot. The data set we list first will be assigned to the x-axis, and the second data set will be assigned to the y-axis.
> plot(coasters, aid)
If we want to add more information to our plot we can include the main, xlab, and ylab arguments.
> plot(coasters, aid, main = "Roller Coasters vs. Tsunami Aid", xlab = "Number of Roller Coasters", ylab = "Tsunami Aid (in millions)")
Scatter plots are used to visualize the relationship between two different variables, and perform a visual check for correlations. We can also calculate the strength of a correlation more directly with the cor command.
> cor(coasters, aid)
[1] 0.965223
The result of this calculation is the Pearson correlation coefficient, r. Values of r close to 1 indicate a strong positive correlation while values of r close to -1 indicate a strong negative correlation. In this case r = 0.965223, indicating a strong positive correlation in our sample data.
The method argument can be added to this calculation to calculate Spearman's rank correlation coefficient, or the Kendall rank correlation coefficient.
The method argument can be added to this calculation to calculate Spearman's rank correlation coefficient, or the Kendall rank correlation coefficient.
> cor(coasters, aid, method = "spearman")
[1] 0.8997002
> cor(coasters, aid, method = "kendall")
[1] 0.8090398
In this example we should consider basing any conclusions on the Kendall rank correlation coefficient since both of our data sets have outliers, suggesting they do not follow a normal distribution.
No comments:
Post a Comment