Statistics with R
Student Resources
Chapter 3: Descriptive Statistics: Numerical Methods
1. Use R to find the 90th percentile, the 1st, 2nd, and 3rd quartiles as well as the minimum and maximum values of the LakeHuron data set. (Recall that R has a number of data sets included in its basic installation; LakeHuron is the name of the data set that contains the “level of Lake Huron 1875 - 1972.” To see all the available data sets included in R, simply enter data() at the R prompt in the Console.) What are the mean and median?
Answer:
quantile(LakeHuron, prob = c(0.00, 0.25, 0.50, 0.75, 0.90, 1.00))
## 0% 25% 50% 75% 90% 100%
## 575.960 578.135 579.120 579.875 580.646 581.860
mean(LakeHuron)
## [1] 579.0041
median(LakeHuron)
## [1] 579.12
2. Use R to find the range, the interquartile range, the variance, the standard deviation, and the coefficient of variation of the LakeHuron data set.
Answer:
#Comment. Set the number of decimal places (digits) to be reported
options(digits = 4)
max(LakeHuron) - min(LakeHuron)
## [1] 5.9
IQR(LakeHuron)
## [1] 1.74
var(LakeHuron)
## [1] 1.738
sd(LakeHuron)
## [1] 1.318
sd(LakeHuron) / mean(LakeHuron)
## [1] 0.002277
3. Use R to create a vector with the following elements: -37.7, -0.3, 0.00, 0.91, e, π, 5.1, 2e, and 113,754, where e is the base of the natural logarithm (roughly 2.718282...) and π the ratio of a circle’s diameter to its radius (about 3.141593...). Name it E3_2 and find the mean, median, 78th percentile, variance, and standard deviation. Note: R understands exp(1) as e, pi as π.
Answer:
#Comment. Override default of reporting very large (and very small)
#numbers with scientific notation
options(scipen=99)
E3_2 <- c(-37.7, -0.3, 0.00, 0.91, exp(1), pi, 5.1, 2*exp(1), 113754)
mean(E3_2)
## [1] 12637
median(E3_2)
## [1] 2.718
quantile(E3_2, prob = c(0.78))
## 78%
## 5.181
var(E3_2)
## [1] 1437840293
sd(E3_2)
## [1] 37919
The mean is 12,637; the median is 2.7 or e. Since the data values in E3_2 are arranged in ascending order, the median is easily identified as the middle value, e (or 2.718282...), since there are four values below and four values above. Moreover, simply summing all nine data values, and dividing by nine, provides the mean. The 78th percentile is reported as 5.181; the variance and standard deviation are 1,437,840,293 and 37,919, respectively.
4. Use R to define 2 vectors, x and y, where x contains 24, 22, 22, 21, and 19, and y contains 27, 24, 23, 21, and 19. Which is the most likely correlation coefficient describing the relationship between x and y? -0.90, -0.50, -0.10, 0.00, +0.10, +0.50, or +0.90? Use R to find the correlation and covariance x and y.
Answer:
+0.90 is the closest value that the correlation coefficient might assume: the relationship between the two variables is not only positive, it is linear as well. The actual correlation coefficient and plot confirm this relationship.
#Comment1. create vector x
x <- c(24, 22, 22, 21, 19)
#Comment2. create vector y
y <- c(27, 24, 23, 21, 19)
#Comment3. using vectors x and y, create data frame data
data <- data.frame(X = x, Y = y)
#Comment4. examine contents of data frame
data
## X Y
## 1 24 27
## 2 22 24
## 3 22 23
## 4 21 21
## 5 19 19
#Comment5. find correlation coefficient of x and y
cor(data$X, data$Y)
## [1] 0.98
#Comment6. create the scatter plot of x against y
plot(data$X, data$Y, pch = 19, xlab = 'x', ylab = 'y')
5. Use R to define a vector with the following elements: 10, 20, 30, 40, 50, 60, 70, 80, 90, 100. Making use of the vectorization capability of R, find the sample variance and sample standard deviation of this set of data. Just to make certain that your answers are correct, check both against those using the functions var() and sd().
#Comment1. define data object E3_3
E3_3 <- seq(from = 10, to = 100, by = 10)
#Comment2. find the variance
xbar <- mean(E3_3)
devs <- (E3_3 - xbar)
sqrd.devs <- (devs)^2
sum.sqrd.devs <- sum(sqrd.devs)
variance <- sum.sqrd.devs / (length(E3_3) - 1)
variance
## [1] 916.7
#Comment3. find the standard deviation
standard.deviation <- sqrt(variance)
standard.deviation
## [1] 30.28
#Comment4. find the variance of E3_3 using var() function
var(E3_3)
## [1] 916.7
#Comment5. find the standard deviation of E3_3 using sd() function sd(E3_3)
## [1] 30.28
Answer: the variance is 916.7, the standard deviation 30.28.