Provide In Plain Text R Commands That Find And Solve The Pro

Provide In The Plain Text R Commands That Findssolves The Followingt

Provide in the plain text R commands that finds/solves the following: The student directory for a large university has 400 pages with 130 names per page, a total of 52,000 names. Using software, show how to select a simple random sample of 10 names. From the Murder data file, use the variable murder, which is the murder rate (per 100,000 population) for each state in the U.S. in 2017 according to the FBI Uniform Crime Reports. At first, do not use the observation for D.C. (DC). Using software: Find the mean and standard deviation and interpret their values. Find the five-number summary, and construct the corresponding boxplot. Now include the observation for D.C. What is affected more by this outlier: The mean or the median? The Houses data file lists the selling price (thousands of dollars), size (square feet), tax bill (dollars), number of bathrooms, number of bedrooms, and whether the house is new (1 = yes,0 = no) for 100 home sales in Gainesville, Florida. Let’s analyze the selling prices. Construct a frequency distribution and a histogram. Find the percentage of observations that fall within one standard deviation of the mean. Construct a boxplot. Datasets needed are at Index of Data Sets Useful functions in R to solve problems in this assignment: sample, read.table, mean, sd, summary, boxplot, hist, table, cbind, length, case, tapply

Paper For Above instruction

The task involves several steps of data sampling, descriptive statistics, and visualization using R programming for three different datasets. These datasets include a student directory, murder rates, and housing data. The following sections detail the R commands and explanations to accomplish each of these tasks effectively.

Sampling from a Large Student Directory

Given a directory with 52,000 names organized across 400 pages, each containing 130 names, we aim to select a simple random sample of 10 names. First, we need to generate the total list of names, which can be conceptualized as a vector of integers from 1 to 52000. Then, the sample() function can be used to select 10 unique random indices representing the names.

Create a vector representing all names

total_names

Select 10 random unique names

sampled_names

This code produces 10 randomly selected indices. If actual names are stored in a data frame or array, replace total_names with your data structure and subset accordingly.

Analyzing the Murder Rate Data

Assuming the data is stored in a file, say murder_data.txt, with a variable named murder representing the murder rate per 100,000 population, follow these steps:

Load the data assuming it's a tabular text file

murder_data

Remove the District of Columbia (DC)

murder_data_noDC

Calculate mean and standard deviation

mean_murder

sd_murder

Output the results

print(paste("Mean murder rate (excluding DC):", mean_murder))

print(paste("Standard deviation:", sd_murder))

Interpretation: The mean provides the average murder rate across states (excluding DC), while the standard deviation indicates the variability or dispersion of murder rates. Higher standard deviation suggests more variability among states.

Five-Number Summary and Boxplot

Summary statistics

five_num

print(five_num)

Boxplot

boxplot(murder_data_noDC$murder, main = "Boxplot of Murder Rates (Excluding DC)", ylab = "Murder Rate per 100,000")

Including DC's observation, re-include the DC data:

Include DC data

murder_data_all

Calculate new summary

new_five_num

print(new_five_num)

Revised boxplot including DC

boxplot(murder_data_all$murder, main = "Boxplot of Murder Rates (Including DC)", ylab = "Murder Rate per 100,000")

The outlier's impact is more significant on the mean than the median because the mean is sensitive to extreme values, whereas the median provides a measure robust to outliers.

Analyzing the Gainesville Housing Data

Assuming the dataset is stored in a file, e.g., houses_gainesville.txt, with variables including selling price, size, tax bill, bathrooms, bedrooms, and newness indicator:

Read the data

houses

Focus on selling price

selling_prices

Construct frequency distribution

breaks

freq_dist

print(freq_dist)

Histogram

hist(selling_prices, main = "Histogram of House Selling Prices", xlab = "Selling Price (Thousands of Dollars)", col = "lightblue", breaks = 10)

Calculate mean and standard deviation

mean_price

sd_price

Percentage within one SD of the mean

lower_bound

upper_bound

within_one_sd = lower_bound & selling_prices

percentage_within_one_sd

cat("Percentage within one SD of the mean:", percentage_within_one_sd, "%\n")

Boxplot

boxplot(selling_prices, main = "Boxplot of House Selling Prices", ylab = "Selling Price (Thousands of Dollars)")

This analysis provides insights into the distribution, variability, and outliers among the house prices in Gainesville.

Conclusion

Using R commands, systematic sampling, descriptive statistics, boxplots, histograms, and frequency distributions can be efficiently generated for various datasets. These techniques help in understanding the underlying data distributions, variability, and outliers. Accurate interpretation of these measures guides informed decision-making, especially in fields such as criminology and real estate analysis.

References

  • R Core Team (2023). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
  • Agresti, A., & Franklin, C. (2017). Statistics: The Art and Science of Learning from Data. Pearson.
  • Everitt, B. S. (2002). The Analysis of Contingency Tables. Chapman and Hall/CRC.
  • Wilkinson, L., et al. (2005). The Grammar of Graphics (2nd ed.). Springer.
  • Miller, R. G. (1997). Beyond ANOVA: Basics of Applied Statistics. Chapman and Hall/CRC.
  • Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Pearson.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning. Springer.
  • Cribbie, R. (2019). An Introduction to Data Analysis in R. Springer.
  • Dalgaard, P. (2008). Introductory Statistics with R. Springer.
  • Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning. Springer.