Provide In Plain Text R Commands That Find And Solve The Pro
Provide In The Plain Text R Commands That Findssolves The Followingt
Provide in the plain text R commands that finds/solves the following: The student directory for a large university has 400 pages with 130 names per page, a total of 52,000 names. Using software, show how to select a simple random sample of 10 names. From the Murder data file, use the variable murder, which is the murder rate (per 100,000 population) for each state in the U.S. in 2017 according to the FBI Uniform Crime Reports. At first, do not use the observation for D.C. (DC). Using software: Find the mean and standard deviation and interpret their values. Find the five-number summary, and construct the corresponding boxplot. Now include the observation for D.C. What is affected more by this outlier: The mean or the median? The Houses data file lists the selling price (thousands of dollars), size (square feet), tax bill (dollars), number of bathrooms, number of bedrooms, and whether the house is new (1 = yes,0 = no) for 100 home sales in Gainesville, Florida. Let’s analyze the selling prices. Construct a frequency distribution and a histogram. Find the percentage of observations that fall within one standard deviation of the mean. Construct a boxplot. Datasets needed are at Index of Data Sets Useful functions in R to solve problems in this assignment: sample, read.table, mean, sd, summary, boxplot, hist, table, cbind, length, case, tapply
Paper For Above instruction
The task involves several steps of data sampling, descriptive statistics, and visualization using R programming for three different datasets. These datasets include a student directory, murder rates, and housing data. The following sections detail the R commands and explanations to accomplish each of these tasks effectively.
Sampling from a Large Student Directory
Given a directory with 52,000 names organized across 400 pages, each containing 130 names, we aim to select a simple random sample of 10 names. First, we need to generate the total list of names, which can be conceptualized as a vector of integers from 1 to 52000. Then, the sample() function can be used to select 10 unique random indices representing the names.
Create a vector representing all names
total_names
Select 10 random unique names
sampled_names
This code produces 10 randomly selected indices. If actual names are stored in a data frame or array, replace total_names with your data structure and subset accordingly.
Analyzing the Murder Rate Data
Assuming the data is stored in a file, say murder_data.txt, with a variable named murder representing the murder rate per 100,000 population, follow these steps:
Load the data assuming it's a tabular text file
murder_data
Remove the District of Columbia (DC)
murder_data_noDC
Calculate mean and standard deviation
mean_murder
sd_murder
Output the results
print(paste("Mean murder rate (excluding DC):", mean_murder))
print(paste("Standard deviation:", sd_murder))
Interpretation: The mean provides the average murder rate across states (excluding DC), while the standard deviation indicates the variability or dispersion of murder rates. Higher standard deviation suggests more variability among states.
Five-Number Summary and Boxplot
Summary statistics
five_num
print(five_num)
Boxplot
boxplot(murder_data_noDC$murder, main = "Boxplot of Murder Rates (Excluding DC)", ylab = "Murder Rate per 100,000")
Including DC's observation, re-include the DC data:
Include DC data
murder_data_all
Calculate new summary
new_five_num
print(new_five_num)
Revised boxplot including DC
boxplot(murder_data_all$murder, main = "Boxplot of Murder Rates (Including DC)", ylab = "Murder Rate per 100,000")
The outlier's impact is more significant on the mean than the median because the mean is sensitive to extreme values, whereas the median provides a measure robust to outliers.
Analyzing the Gainesville Housing Data
Assuming the dataset is stored in a file, e.g., houses_gainesville.txt, with variables including selling price, size, tax bill, bathrooms, bedrooms, and newness indicator:
Read the data
houses
Focus on selling price
selling_prices
Construct frequency distribution
breaks
freq_dist
print(freq_dist)
Histogram
hist(selling_prices, main = "Histogram of House Selling Prices", xlab = "Selling Price (Thousands of Dollars)", col = "lightblue", breaks = 10)
Calculate mean and standard deviation
mean_price
sd_price
Percentage within one SD of the mean
lower_bound
upper_bound
within_one_sd = lower_bound & selling_prices
percentage_within_one_sd
cat("Percentage within one SD of the mean:", percentage_within_one_sd, "%\n")
Boxplot
boxplot(selling_prices, main = "Boxplot of House Selling Prices", ylab = "Selling Price (Thousands of Dollars)")
This analysis provides insights into the distribution, variability, and outliers among the house prices in Gainesville.
Conclusion
Using R commands, systematic sampling, descriptive statistics, boxplots, histograms, and frequency distributions can be efficiently generated for various datasets. These techniques help in understanding the underlying data distributions, variability, and outliers. Accurate interpretation of these measures guides informed decision-making, especially in fields such as criminology and real estate analysis.
References
- R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
- Agresti, A., & Franklin, C. (2017). Statistics: The Art and Science of Learning from Data. Pearson.
- Everitt, B. S. (2002). The Analysis of Contingency Tables. Chapman and Hall/CRC.
- Wilkinson, L., et al. (2005). The Grammar of Graphics (2nd ed.). Springer.
- Miller, R. G. (1997). Beyond ANOVA: Basics of Applied Statistics. Chapman and Hall/CRC.
- Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Pearson.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
- Cribbie, R. (2019). An Introduction to Data Analysis in R. Springer.
- Dalgaard, P. (2008). Introductory Statistics with R. Springer.
- Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning. Springer.