Statistics Questions To Answer

Statistics Questions To Answerdocrtf2notean Excel Workbook Has Als

Statistics Questions to Answer.doc.rtf 2 * Note: An Excel Workbook has also been uploaded. Within that workbook are 8 XLS files which are included in 8 separate tabs . These files will be needed to answer most of the questions. This work is due Friday, September 19 th Q1) Fill in the blanks (show your work). Variable N Mean Median TrMean StDev haircut ....20 sleep ....8396 age ....646 Correlations: haircut, sleep, age haircut sleep sleep -0.117 age 0.) Covariances: haircut, sleep, age haircut sleep age haircut (2)_ sleep -1..70491 age 4...29226 Blank 1 = Blank 2 = Q2) Is the following statement correct? Explain why or why not. “A correlation of 0 implies that no relationship exists between the two variables under study.†Q3) Does how long children remain at the lunch table help predict how much they eat? The data in file lunchtime.xls (File is in Tab#1 of Excel Workbook) gives information on 20 toddlers observed over several months at a nursery school. “Time†is the average number of minutes a child spent at the table when lunch was served. “Calories†is the average number of calories the child consumed during lunch, calculated from careful observation of what the child ate each day. Find the correlation for these data. Suppose we were to record time at the table in hours rather than in minutes. How would the correlation change? Why? Write a sentence or two explaining what this correlation means for these data. Remember to write about food consumption by toddlers rather than about correlation coefficients. One analyst concluded, “It is clear from this correlation that toddlers who spend more time at the table eat less. Evidently something about being at the table causes them to lose their appetites.†Explain why this explanation is not an appropriate conclusion from what we know about the data. Q4) In file bach.xls (File is in Tab#2 of Excel Workbook) is state by state data (plus Washington, DC) on percentage of residents over he age of 25 who have at least a bachelor’s degree and median salary. What is the correlation between these two variables? Produce a scatter plot of the data with percentage with bachelor’s degree on the X axis. Notice the outlier? Who does that point belong to? Can you think of any reasons why this location might have a high percentage of residents with a bachelor’s degree but a lower than expected median income? Remove the outlier point found in (b) and recalculate the correlation. How do the two correlation values compare? What does this illustrate about correlation? Q5 ) The mean rate of return and standard deviation of Stocks 1 and 2 are given below: Stock 1 Stock 2 Mean 10 % 20 % Standard deviation 20 % 30 % Given that the correlation between stocks is -1.0, find risk (standard deviation) and (mean) return of a portfolio that that 60% in Stock 1 and 40% in stock 2. Given that the correlation between stocks is 0, find risk (standard deviation) and (mean) return of a portfolio that that 60% in Stock 1 and 40% in stock 2. Given that the correlation between stocks is 1, find risk (standard deviation) and (mean) return of a portfolio that that 60% in Stock 1 and 40% in stock 2. What appears to be the relationship between correlation and risk? Q6) The file portfolioprob.xl s (File is in Tab#3 of Excel Workbook) has about 3 years of monthly returns data for GPS, BBY and MRK. That is, each row represents the monthly return for each of the three stocks. What company does each symbol represent? Go to finance.yahoo.com to find out. What is the average monthly return for each of the three stocks ? What is the standard deviation for the returns of the three stocks ? What is the correlation between GPS a nd B B Y , G P S a nd M R K a nd B B Y a nd M R K ? Create a side by side boxplot for these three stocks. How do they compare? Which looks the riskiest, which the safest? Give the expected return and standard deviation of all the possible two stock po r t f olios (G PS , B B Y) , ( G PS ,M R K) , ( B B Y ,M R K ) w ith e q u a l a mounts i nv e st e d in each stock (weights of .5 for each stock). Rank the three portfolios based on their standard deviation. How do they compare with holding one of the individual stocks ? Q7) In class we showed how one could split a data set into two groups using the median of the X values, then find points ( X 1, Y 1 ) and ( X 2 , Y 2 ) . We then fit a line between these two points using the familiar Y - Y 1 = m ( X - X 1 ) formula where m = ( Y 2 - Y 1 ) / ( X 2 - X 1 ) . This can be done in Stata as follows (there are fancier ways to do this in Stata-we’re just showing you one way below). For this example we will use the data set onlineedu.xls (File is in Tab#4 of Excel Workbook) One of the biggest changes in higher education in recent years has been the growth of online universities. The Online Education Database is an independent organization whose mission is to build a comprehensive list of the top accredited online colleges. The data set onlineedu.xls shows the retention rate (%) and the graduation rate (%) for 29 online colleges (Online Education Database website, January 2009). We want to model graduation rate as a function of retention rate. Load the data into Stata (of course!) (use File -> Import -> Excel File). Find the median of the X’s (retention rate) . summarize rr,detail RR ------------------------------------------------------------- 1% Percentiles 4 Smallest % % Obs % Sum of Wgt. % 60 Mean 57.% 69 Largest 78 Std. Dev. 23.% Variance 540.% Skewness -.% Kurtosis 3.185897 So the median of the X’s equals 60. Find the means for values below the median . summarize rr gr if rr60,detail RR ------------------------------------------------------------- 1% Percentiles 62 Smallest % % Obs % Sum of Wgt. % 71 Mean 75.% 78 Largest 78 Std. Dev. 13.% Variance 182.% Skewness .% Kurtosis 2.425894 GR ------------------------------------------------------------- 1% Percentiles 34 Smallest % % Obs % Sum of Wgt. % 48 Mean 46.% 55 Largest 55 Std. Dev. 9.% Kurtosis 1.437486 So ( X 2 , Y 2 ) =(75.43,46.14) Find the line between the points ( X 1 , Y 1 ) and ( X 2 , Y 2 ) . ( X 1, Y 1 ) =(40.6,37.67) ( X 2 , Y 2 ) =(75.43,46.14) m = ( Y - Y ) / ( X - X ) = (46..67) = 0. Y - Y 1 = m ( X - X ..6) Y = 37.67 + 0.24( X - 40.6) = 27.93 + 0.24 X So the equation of the fitted line is Y=27.93+0.24X. Now do this two point method using the medians in each subgroup instead of the means. Report the equation of this new line. Compare this new line to the one previously found using the means in each sub group. Are the lines about the same or different? Create a plot that shows the data, and the two lines on it. Make sure it’s clear which line is which. Q8) The owner of a moving company typically has his most experienced manager predict the total number of labor hours that will be required to complete an upcoming move. This approach has proved useful in the past, but the owner has the business objective of developing a more accurate method of predicting labor hours (Y). In a preliminary effort to provide a more accurate method, the owner has decided to use the number of cubic feet moved as the independent variable (X) and has collected data for 36 moves in which the origin and destination were within the borough of Manhattan in New York City and in which the travel time was an insignificant portion of the hours worked. The data are stored in moving.xls (File is in Tab#5 of Excel Workbook) Use Stata to answer the questions below. Create a scatter diagram of the data. Fit a least squares regression line to this data and interpret the slope (stata command reg ). Fit a least absolute deviations regression line to this data and interpret the slope (stata command qreg ). Are the lines produced by reg and qreg very different? Compare them. Predict the labor hours for a 500 cubic feet move using the estimated regression equation developed in part (b). Q9) A nutritionist was interested in developing a model that describes the relation between the amount of fat (in grams) in cheeseburgers at fastfood restaurants and the number of calories. She obtains the following data from the Web sites of the companies. The data is in cheeseburger.xls (File is in Tab#6 of Excel Workbook) The researcher wants to use fat content to predict calories. Which is the explanatory variable? Draw a scatter diagram of the data. Fit a regression model and interpret the slope and y-intercept. Is the value of the y-intercept meaningful? Predict the number of calories in a sandwich that has 30 grams of fat. A cheeseburger from Sonic has 700 calories and 42 grams of fat. Is the number of calories for this sandwich above or below average among all sandwiches with 42 grams? 14 Q10) The table below shows the number of live births per 1000 women aged 15–44 years in the United States, starting in 1965. (National Center for Health Statistics, ww w .cdc.gov/nchs /). Make a scatterplot and describe the general trend in Birthrates. (Enter Year as years since 1900: 65, 70, 75, etc.) The data is in file birthrates.xls Find the equation of the regression line. Interpret the slope of the line. The table gives rates only at 5-year intervals. Estimate what the rate was in 1978. In 1978 the birthrate was actually 15.0. How close did your model come? In 2009, the birthrate was 13.5. How close did your model come? Predict the Birthrate for 2025. Comment on your faith in this prediction. Q11 ) A stock's (or mutual fund's) b (beta) measures the relationship between the stock's rate of return and the average rate of return for the market as a whole. Now beta is easy to compute. It is the slope from a simple linear regression [what we refer to as b l ], where the dependent variable (Y) is the stock's rate of return and the independent variable is the market rate of return (X) (usually taken as the rate of return of the S&P 500 or the Nasdaq). Stocks with beta values greater than 1 are considered ``aggressive'' and stocks with beta less than 1 are considered defensive. A stock with a beta value near 1 is called a neutral security. As an example, we have monthly returns on CAT (Caterpillar, Inc) and the S&P 500 index (denoted the market index) and perform a regression analysis: In this example we find that b =1.88656 so CAT is more risky than the market. If you go to finance.yahoo.com, you can enter in a stock symbol and along with the current price of the stock and some other financial values, you will obtain the Beta fore that stock (its such a popular measure). It is a little different than ours since they are using a slightly different time period, but it is close. Q12 ) The article at adsk-aks-dow-gci-fitb0425.aspx (also on the class website) discusses (from 2011) five high beta stocks that might be useful for your portfolio (IF you believe in beta and IF you think the market is going up! If the market is going down holding high beta stocks can be very painful!). Here is the table from the article: Let’s first see if we can get close to these numbers (everyone calculates beta slightly differently darn it! That to me is another strike against beta.). The usual method is to use three years of monthly returns. In Stat 107 we spend more time discussing the pros and cons of beta. Using the data in file beta_test_old.xls , (File is in Tab#7 of Excel Workbook ) run five regressions, each time using SPY as the explanatory variable (x variable), and each stock above as the response variable (y variable). That is, calculate the beta for each stock using our data. Are our calculated betas close to the table above? Beta is a measure of risk of a stock; it is one of numerous measures finance professionals can use. Another measure would simply be the standard deviation of returns of each stock. Rank the stocks above based on their standard deviation, from highest to lowest. Is the order the same as if you ranked them from highest to lowest beta? Picking stocks based on beta is not the best idea in the world. For one thing, betas can be very time dependent. In file beta_test_new.xls (File is in Tab#8 of Excel Workbook ) is data on the same stocks, but instead of from (from the article), it is now . Calculate the five betas again and comment on how they have changed. Does this make you think beta is a good or bad measure to use for picking stocks? If a stock’s beta stayed about the same, would that make the stock more attractive to you? Briefly explain. Beta is whack! It has some good uses, but it can be very time dependent and data dependent. To make you very wary of financial data, we will see what two major financial web sites think the beta for Abercrombie & Fitch (ANF) is. For Yahoo, go to finance.yahoo.com and enter ANF and find the reported Beta. For Google, go to finance.google.com and enter ANF. Beta will be shown automatically. Are you surprised? Which beta is correct? Why do you think they are different? Briefly explain. END XLS Files needed to answer some questions.xls 1 Lunchtime Calories Time ................... BACH state bach income Alabama 21. Alaska Arizona 25. Arkansas 19. California 29. Colorado Connecticut 34. Delaware 26. District of Columbia 47. Florida 25. Georgia 27. Hawaii 29. Idaho 24. Illinois 29. Indiana 22. Iowa 24. Kansas 28. Kentucky Louisiana 20. Maine 26. Maryland 35. Massachusetts 37. Michigan 24. Minnesota Mississippi 18. Missouri 24. Montana Nebraska 27. Nevada 21. New Hampshire 32. New Jersey 33. New Mexico 24. New York 31. North Carolina 25. North Dakota 25. Ohio 24. Oklahoma 22. Oregon 28. Pennsylvania 25. Rhode Island 29. South Carolina 23. South Dakota Tennessee 21. Texas 25. Utah 28. Vermont 33. Virginia 33. Washington 30. West Virginia 17. Wisconsin 25. Wyoming 23. Portfolio Prob GPS BBY MRK 0......................................................... Online.EDU College RR(%) GR(%) Western International University 7 25 South University University of Phoenix 4 28 American InterContinental University Franklin University Devry University Tiffin University Post University Peirce College Everest University Upper UIowa University Dickinson State University Western Governors University Kaplan University Salem International University Ashford University ITT Technical Institute Berkeley College Grand Canyon University Nova Southeastern University Westwood College Everglades University Liberty University LeTourneau University Rasmussen College Keiser University Herzing College National University Florida National College Moving Hours Feet Large Elevator Yes 13. Yes 26. No No Yes Yes Yes 11. Yes Yes Yes 38. Yes Yes 19. Yes Yes Yes Yes No Yes Yes Yes Yes Yes Yes 16. Yes No No Yes Yes Yes Yes Yes Yes Yes 62. No 53. Yes 79. No 6 Cheeseburger fat calories beta_test_old SPY ADSK AKS DOW GCI FITB -0....................................................................................................................................................................................................................... beta_test_new SPY ADSK AKS DOW GCI FITB 0.................................................................................................................................................................................................05 0...............

Paper For Above instruction

Understanding and analyzing statistical data is fundamental in numerous fields, including economics, finance, healthcare, and social sciences. The set of questions presented compels a comprehensive examination of statistical concepts such as descriptive statistics, correlation, causation, regression analysis, hypothesis testing, and risk assessment. This paper aims to systematically address each question through analysis, explanation, and interpretation, emphasizing the application of statistical tools to real-world datasets.

Part 1: Descriptive Statistics and Correlations

Initially, the dataset involves variables such as haircut, sleep, and age, with corresponding sample sizes and summary statistics. Computing the mean, median, trimean, and standard deviation serves as an essential first step to understanding the distribution of each variable. For example, the variable 'haircut' has an N of 20, with values missing, but based on assumptions, we interpret the available data to calculate relevant statistics.

The correlation matrix reveals relationships between variables—specifically, a negative correlation of -0.117 between sleep and another variable, possibly 'age,' which indicates a weak inverse relationship. Covariance calculations further elucidate the degree to which two variables change together, with covariance values illustrating the strength and direction of the relationships.