Elements Of Statistics

elements Of Statistics

Key Unit 1Problems Set Unit 1 Problems Set NAME: Elements of Statistics-- Virtual College--Fall 2015 REMEMBER, these are assessed preparatory problems related to the content of Unit 1. The Unit 1 Exam will consist of similar types of problems, but not exactly the same. Thus, make sure you are thinking about the concepts and procedures you studied in this unit versus simply “copying†the process of an example problem. Listed out to the left of the spreadsheet are text chapter separators if you find yourself needing some direction to a related resource. All answers should be calculated, as needed, within this Excel sheet below or to the right of the problem and final concluding answers given directly below the problem.

Please make your answers easily found--for example, use a different color or font style. No numerical answer resulting from a calculation will be accepted unless the process is performed in Excel with formulas and calculations visible when the cell is selected. Type your name at the top, complete and return this file saved as "yournameUnit1ProblemSet" through the Exam Prep region in Blackboard (from the same location that you downloaded this file). This problem set is due no later than 09/15/2015. Your instructor will grade and return it with feedback to assist your preparation for the Unit 1 Exam.

In each of the following, classify the resulting data variable as qualitative or quantitative. If quantitative, label as discrete or continuous.

a. The U.S. Census Bureau collects data on household size in the United States.

b. Human beings have one of four blood types: A, B, AB, or O.

c. The weight of a randomly selected football player on the FHSU Fall 2015 roster.

d. Answering "Agree" or "Disagree" when asked whether public school children should wear uniforms.

e. The number of single family homes in Hays, Kansas.

f. The height of a randomly selected waterfall in Hawaii.

Use the following situation to answer questions 2 through 6. Pick the one best answer from the multiple choices given on question 2 and 3: A recent national study about the effectiveness of Echinacea in cold treatments was performed by a medical school in Kansas City. The results stated that 22.5% of the randomly chosen 250 adult subjects in the placebo group in the study noted that their treatment appeared to shorten the length of their colds. In an attempt to determine the average high school GPA of all students enrolled at a Regents University in Kansas, a researcher first randomly selects one of the six Regents Universities, then selects a random sample of 50 students from that University from which to gather data.

2. The implied population in this study is:

  • a. All people taking Echinacea to treat their cold
  • b. All the randomly chosen adult subjects in the study described
  • c. All adults in the United States
  • d. All medical schools testing the effectiveness of Echinacea

3. The implied sample in this study is:

  • a. All people taking Echinacea to treat their cold
  • b. All the randomly chosen adult subjects of the placebo group in the study
  • c. All adults in the United States
  • d. All medical schools testing the effectiveness of Echinacea

4. The descriptive statistic of interest in this study is ____________.

5. The study would be categorized as: (select ALL that apply)

  • a. experimental
  • b. cross sectional
  • c. observational
  • d. retrospective
  • e. prospective

6. Would the value of 22.5% described above be considered a parameter or a statistic? Why?

22.5% would be considered a statistic, as it relates to the sample character of the study.

7. You are interested in all FHSU students’ opinions regarding open educational resources. What is wrong with drawing conclusions about FHSU students’ opinions from a random sample taken of twenty-five of your closest friends? (Be specific!)

Sampling only your friends introduces sampling bias because this group is unlikely to be representative of the entire FHSU student body. The sample lacks randomness and diversity, which impacts the validity of generalizing results to all students.

8. Compute the value of the mathematical calculation shown at the right in an adjacent Excel cell. Then give this value in a rounded two decimal place percent form (e.g., 57.34%).

— [Note: since the calculation is not provided in the prompt, this step assumes an example calculation such as a proportion or mean, which should be computed accordingly.]

9. Identify the type of sampling (random, stratified, systematic, convenience, or cluster) that best describes each of the following cases:

  • a. The U.S. Department of Corrections collects data about returning prisoners by randomly selecting five federal prisons and surveying all of the prisoners in each of the prisons. Cluster sampling
  • b. A college teacher surveyed all of her students to obtain sample data consisting of the number of credit cards students possess. Convenience sampling
  • c. A man was an observer at a town's sobriety checkpoint at which every fifth driver was stopped and interviewed. Systematic sampling
  • d. In a Gallup poll, 1005 adults were called after their telephone numbers were randomly generated by a computer, and 38% of them said they get their news from the internet every day. Stratified sampling
  • e. In a study of college programs, 567 students were randomly selected from those majoring in mathematics, 1236 students were randomly selected from those majoring in business, and 822 students were randomly selected from those majoring in psychology. Stratified sampling

10. Identify the data level (nominal, ordinal, interval, or ratio) that best describes each of the following cases:

  • a. Measured amounts of greenhouse gases emitted by 32 different models of cars sold in the U.S. Ratio
  • b. Critic rating of movies on a scale from 0 to 4 stars. Ordinal
  • c. Types of movies (drama, comedy, adventure, documentary, etc.) Nominal
  • d. Actual high temperatures in degrees Fahrenheit recorded in Hays, Kansas for the month of July, 2015. Interval

11. The number of hours worked by 20 FHSU students over the course of one week is given in the table at the right. Create a reasonable frequency table for this data set with exactly six classes. Include a relative frequency column in the table. Construct an appropriate stem-and-leaf plot for the data given above in #11. Then describe the shape of the distribution of the data set (either skewed left or skewed right or symmetric). (HINT: to keep appropriate spacing, choose Courier font format for cells used to display the plot and begin each entry with an apostrophe to make the values act as text versus a single number.)

13. Using the data in #11, determine the following statistics for the data set: a. mean b. median c. mode d. range e. midrange f. standard deviation g. variance

14. Give the five-number summary for the data set given in problem #11. Determine if there are any outliers based on the 1.5 IQR rule (see page 141 in the text). Work must be shown to the right, and final classification of outliers, if any, must be stated explicitly.

15. Answer the two questions below: a. After first assuming the data represents a population instead of a sample in problem #11, what is the z-score of the last piece of data listed (the data value 10)? b. Briefly interpret the meaning of this z-score in relation to the data set.

16. Number of monthly infections reported at a local hospital averaged 138 with a standard deviation of 8.8. During a recent month, the hospital experienced 159 patients with infections. Is that amount considered unusually high if the distribution of number infected tends to be bell-shaped, symmetric? Why or why not?

17. Which score has a higher relative position: a score of 55.8 on a test for which x̄ = 37 and s = 8 , or a score of 375.4 on a test for which x̄ = 283 and s = 44? (Note: x̄ represents the mean value of the sample of tests collected.)

18. Give an example of a situation where the median is likely a better average to use to describe the center location of quantitative data as compared to the mean.

19. Explain what a standard deviation value measures in quantitative data?

20. The graph at the right is a bar graph of the distribution of hours spent playing video games over the past week from a collection of randomly selected teenage males in the United States. Answer the following: a. How many teenage males were part of this study? b. Estimate the mean and the median length of time this group of selected teenage males spent gaming during the last week. c. Is the distribution symmetric, uniform, skewed left, skewed right, or none of these?

21. The lengths of shoe laces in a newly opened package from various manufacturers are measured and recorded. The boxplot to the right illustrates the summarized data collected from these shoe laces. a. What is the interquartile range of the shoe laces? b. Determine the median length of the laces. c. Does the distribution of the lengths appear to be symmetric? Why or why not?

22. Give examples of two different numbers that can be used to represent a probability value. Then, give examples of two numbers that can never represent a probability value. Explain why the last two values you gave cannot represent a probability value.

23. Consider the situation where you have three game chips, each labeled with one of the numbers 1, 2, & 5 in a hat: a. If you draw out 2 chips without replacement between each chip draw, list the entire sample space of possible results that can occur in the draw. b. Define two events as follows for answering parts b to h below: Event A: the sum of the 2 drawn numbers is even. Event B: the sum of the 2 drawn numbers is a multiple of 3. Now, using your answer to part a, find the following probability values: b. P(A)= 2/6 or 2 chances out of 6 c. P(B)= d. P(A&B)= 0/6 chances e. P(A or B)= 4/6 f. P(A given B)= g. P(not B)= 4/6 chances h. Are events A and B mutually exclusive? Why or why not?

24. Provide a written description of the complement of each of the following:

  • a. At least twelve of the patients seen today had some infectious disease.
  • b. All of the patients seen today had some infectious disease.

25. A researcher recorded the amount of time each patron at a fast food restaurant spent waiting in line for service during noontime Saturday. The frequency table at the right summarizes the data collected. First, extend the table to include a relative frequency column. Then, if we randomly select one of the patrons represented in the table, what is the probability that the waiting time is at least 12 minutes?

— [Note: since the specific data is not included, imagine the table has appropriate class intervals and counts.]

Explain the concept of the “Law of Large Numbers.”

27. The data in the accompanying table compares plea and prison sentencing results from 864 randomly chosen criminal court cases. Use the two-way table shown at the right to answer the following questions.

  • a. If one case is randomly selected, find the probability of selecting a case in which the defendant was sentenced to prison.
  • b. Find the probability of selecting a case resulting in a prison sentence, given that the defendant entered a plea of not guilty.
  • c. If one case is randomly selected, find the probability of selecting a case in which the defendant entered a plea of not guilty or was not sentenced to prison.
  • d. If two cases are randomly selected (without replacement), find the probability that both defendants entered a "not guilty" plea.
  • e. If one case is randomly selected, find the probability of selecting a case in which the defendant entered a plea of guilty and was not sentenced to prison.

28. A membership committee for a local community group consists of twenty-five individuals. a. If a task force of five members of this committee must be formed to investigate the membership rules, how many different task force groups might possibly be formed?

b. If a chair, vice chair, secretary, and treasurer must be elected from the twenty-five members, how many different slates of candidates are possible?

29. Give at least four different symbols that have been used to represent specific statistical measures. Describe what measure each represents. (Hint: use the insert symbol option in Excel to find most symbols.)