Reasoning From Sample To Population Chapter 3 2019 McGraw Hi

Reasoning From Sample To Populationchapter 3 2019 Mcgraw Hill Educati

Reasoning from Sample to Population Chapter 3 © 2019 McGraw-Hill Education. All rights reserved. Authorized only for instructor use in the classroom. No reproduction or distribution without the prior written consent of McGraw-Hill Education Learning Objectives Calculate standard summary statistics for a given data sample. Explain the reasoning inherit in a confidence level. Construct a confidence interval. Explain the reasoning inherit in a hypothesis test. Execute a hypothesis test. Outline the roles of deductive and inductive reasoning in making active predictions.

Population parameter a numerical expression that summarizes some feature of the population Objective degree of support using inductive and deductive reasoning Construction of a confidence interval Hypothesis testing Distributions and Sample Statistics

Random variable A variable that can take on multiple values, with any given realization of the variable being due to chance (or randomness) Deterministic variable A variable whose value can be predicted with certainty Distributions of Random Variables

Distributions of Random Variables DISCRETE COUNTABLE NUMBER OF VALUES (e.g., 5, 9, 19, 27…) CONTINUOUS UNCOUNTABLE INFINITE NUMBER OF VALUES (ALL THE NUMBERS, TO ANY DECIMAL PLACE, BETWEEN 0 AND 1)

The probabilities of individual outcomes for a discrete random variable are represented by a probability function EXAMPLE OF 10 PEOPLE: 3 OF THEM ARE 25 YEARS OLD, 4 OF THEM ARE 30 YEARS OLD, 2 OF THEM ARE 40 TEARS OLD, 1 OF THEM IS 45 YEARS OLD. PROBABILITY THAT A SINGLE DRAW WILL BE: 25 YEARS OLD IS 3/10 = 0.3 30 YEARS OLD IS 4/10 = 0.4, 40 TEARS OLD IS 2/10 = 0.2, 45 YEARS OLD IS 1/10 = 0.1. DISCRETE RANDOM VARIABLE POPULATION

Graphical Representation for a Discrete Random Variable (Age)

The probabilities of individual outcomes for a continuous random variable are represented by a probability density function (pdf) A special type of continuous random variable is called normal random variable, which has a “bell shaped†pdf

Graphical Representation for a Normal Random Variable

Distributions and Sample Statistics For a normal random variable, and any other continuous random variable, the pdf allows us to calculate the probabilities that the random variable falls in various ranges. The probability that a random variable falls between two numbers A and B is the area under the pdf curve between A and B

Probability that a Random Variable Falling Between Two Numbers

Expected Value or Population Mean The summation of each possible realization of Xi multiplied by the probability of that realization. Variance A common measure for the spread of the distribution; defined by E[(Xi – E(Xi)2]. Standard Deviation The square root of the variance.

Data Samples and Sample Statistics Sample Size of N A collection of N realizations of Xi ; {Xi, X2…. XN } Sample Statistics Single measures of some feature of a data sample Sample Mean A common measure of the center of a sample

Sample Variance Common measure of the spread of a sample For a sample size of N for random variable Xi is: Sample Standard Deviation The square root of the sample variance For a sample size of N for random variable Xi is: Data and Sample Statistics

Confidence Interval Suppose a firm wants to know the average age of its customers It collects data from 872 of its customers, thus its sample size Agei = a random variable defined as the age of a single customer agei = the observed age of customer i in the sample

Confidence Interval Estimator A calculation using sample data that is used to provide information about a population parameter Random sample A sample where every member of the population has an equal chance of being selected

Confidence Interval Deductive argument: If we have a random sample, the sample mean is a “reasonable guess†for the population mean Inductive argument: Then the population mean is the same as the sample mean How sure are we that the population mean in our example is the same as the sample mean? Confidence interval a range of values such that there is a specified probability that they contain a population parameter

Confidence Interval How do we build confidence intervals and determine their objective degree of support? Independent The distribution of one random variable does not depend on the realization of the another Independent and identically distributed (i.i.d) The distribution of one random variable does not depend on the realization of another and each has identical distribution.

Confidence Interval Unbiased estimator An estimator whose mean is equal to the population parameter it is used to estimate Population standard deviation The square root of the population variance Population variance The variance of a random variable over the entire population

Data and Sample Statistics In order to construct a confidence interval for the population mean and know its objective degree of support, we must know something about its standard deviation and its type of distribution The assumption that a data sample is a random sample implies the standard deviation of the sample mean is The spread of the sample mean gets smaller as the sample size increases

Data and Sample Statistics Assuming a random sample with reasonably large N (> 30) implies that the sample mean is normally distributed with the mean of µ and standard deviation of This can be written as:

Probability Sample Mean within 1.96 Standard Deviations of Population Mean

Hypothesis Testing Hypothesis test is the process of using sample data to assess the credibility of a hypothesis about a population Making an assessment Reject the hypothesis Fail to reject the hypothesis

Null hypothesis The hypothesis to be tested using a data sample Written as H0: µ = K, where K is the hypothesized value for the population mean The objective is to determine whether the null hypothesis is credible given the data we observe. If a sample of size N is a random sample, N is “large†(>30) and µ = K, then

Probability Sample Mean within 1.96 Standard Deviations of Hypothesized Population Mean

Steps in Hypothesis Testing: State the null hypothesis Collect the data sample and calculate the sample mean Decide whether or not to reject the deduced distribution for the sample mean Degree of support Measure how many standard deviations the sample mean is from the hypothesized population mean Z =

To calculate Z, take the difference between the sample mean and the hypothesized population mean () Then take that difference and divide it by the standard deviation of the sample mean () t-stat is the difference between the sample mean and the hypothesized population mean () divided by the sample standard deviation (), or t =

Test statistic Any single value derived from a sample that can be used to perform a hypothesis test p-value The probability of attaining a test statistic at least as extreme as the one that was observed

The p-value The probability of attaining a test statistic at least as extreme as the one that was observed

The t-stat is an observed value from a t-distribution, a distribution that resembles a normal distribution and is centered at zero In excel a p-value can be calculated using the formula: 2 à— (1-norm.s.dist(, true)) If the observed t-stat is very unlikely (has a low p-value), then reject this distribution and vice versa If the p-value is less than the cutoff, reject, and fail to reject otherwise

Cutoffs using p-values directly correspond to the degrees of support you chose for your inductive argument If your chosen degree of support is D%, then the cutoff is 100 D%, or 1 D/100 Rejections will be incorrect 5% of the time using this rule because 5% of the time you will observe a p-value less than 0.05 even though the deduced distribution for the sample mean is correct

Standard degrees of confidence used are 90%, 95%, and 99%, the standard cutoffs using p-values are 0.10, 0.05, and 0.01, Reject the distribution if the p-value is less than 0.10; fail to reject otherwise. This generates a degree of support of 90%. Reject the distribution if the p-value is less than 0.05; fail to reject otherwise. This generates a degree of support of 95%. Reject the distribution if the p-value is less than 0.01; fail to reject otherwise. This generates a degree of support of 99%.

The Interplay Between Deductive and Inductive Reasoning in Active Predictions The underlying reason for active predictions: Forming the prediction uses deductive reasoning Assume the causal relationship, when then implies the prediction Estimating the causal relationship uses deductive and inductive reasoning Deductive reasoning: Make assumptions that imply causality between X and Y and the distribution of an estimator for the magnitude of this causality in the population Inductive reasoning: Using an observed data sample, build a confidence interval and/or determine whether to reject a null hypothesis for the magnitude of the population-level causality

Paper For Above instruction

The statistical principles underpinning data analysis rely extensively on understanding how to interpret sample data to infer characteristics of the broader population. This process involves key concepts such as sample statistics, probability distributions, confidence intervals, and hypothesis tests, all of which form the foundation of inferential statistics. The goal is to utilize data gathered from a sample to make substantiated claims about a population, leveraging inductive and deductive reasoning in a systematic manner.

One of the fundamental ideas in inferential statistics is the population parameter, which encapsulates features of an entire population, such as the mean or variance. Since measuring an entire population is often impractical, statisticians use samples—a subset of the population—to estimate these parameters. Sample statistics, such as the sample mean and sample variance, serve as estimators for the corresponding population parameters. For example, the sample mean provides an estimate of the population mean, and understanding the variability or spread around this estimate is crucial for making reliable inferences.

Distributions of random variables play a vital role in this inferential process. Random variables can be classified into discrete or continuous types, each with distinct probability functions. Discrete variables, such as age counts in a fixed set of categories, are represented by probability functions assigning specific probabilities to possible outcomes. Conversely, continuous variables like age or height are characterized by probability density functions (pdf), with the normal distribution being a common example due to its bell-shaped curve. These distributions facilitate calculating the likelihood that a variable falls within specified ranges, aiding in the interpretation of data.

Understanding the concepts of expected value (mean) and variance helps quantify the average outcome and spread of a distribution. The population mean, or expected value, is calculated by summing all possible outcomes weighted by their probabilities. Variance and standard deviation measure how dispersed the data are around this mean, providing insight into the data's variability. When analyzing samples, sample mean and sample variance are computed, which serve as estimators of their population counterparts. As the sample size increases, the law of large numbers ensures these estimators tend to be closer to the true population parameters.

Constructing confidence intervals is a key method for estimating population parameters with defined levels of certainty. A confidence interval provides a range of plausible values for parameters like the mean, with a specified confidence level such as 95%. To build these intervals, it is essential to know the standard deviation of the population or to estimate it from the sample, assuming the sample is randomly drawn and sufficiently large (N > 30). When these conditions are met, the sampling distribution of the sample mean approximates a normal distribution (by the Central Limit Theorem).

Confidence intervals are constructed using the sample mean, the estimated standard deviation, and a critical value from the normal distribution (such as 1.96 for 95% confidence). The interval's width reflects the degree of uncertainty; larger samples tend to produce narrower intervals, indicating more precise estimates. Additionally, the concepts of unbiased estimators ensure that the expected value of the estimator matches the true population parameter, reinforcing the reliability of the inference.

Hypothesis testing provides a formal framework for assessing claims about a population parameter. The null hypothesis (H0), often positing no effect or a specific value (e.g., population mean equals a certain value), is tested against sample data. By calculating a test statistic—such as a Z-value or t-statistic—and the associated p-value, statisticians determine the likelihood of observing the data if the null hypothesis is true. If this probability (p-value) falls below predetermined significance levels (e.g., 0.05), the null hypothesis is rejected, suggesting evidence against it.

The process involves several steps: stating the null hypothesis, collecting data, computing the test statistic, and interpreting the p-value. The t-distribution is used when the population standard deviation is unknown and the sample size is small, whereas the Z-distribution applies when the population standard deviation is known or the sample size is large. The p-value provides a quantitative measure of evidence, with lower p-values indicating stronger evidence against the null hypothesis.

Choosing appropriate cutoff levels for p-values (such as 0.05 for 95% confidence) controls the error rates in decision-making. Standard confidence levels—90%, 95%, 99%—correspond to specific p-value thresholds, guiding whether to accept or reject hypotheses. These methods integrate deductive reasoning (formulating hypotheses based on cause-effect assumptions) with inductive reasoning (deriving inferences from observed data), enabling robust predictions and decision-making in statistical analysis.

In conclusion, the interplay of sample data, probability distributions, confidence intervals, and hypothesis testing forms the backbone of inferential statistics. This synergy allows researchers to draw meaningful conclusions about a population from finite samples, balancing inductive insights with deductive frameworks. Mastery of these concepts enhances the reliability of statistical inferences and supports evidence-based decision-making in diverse fields such as economics, health sciences, and engineering.

References

  • Casella, G., & Berger, R. L. (2002). Statistical Inference. Duxbury.
  • Moore, D. S., McCabe, G. P., & Craig, B. A. (2012). Introduction to the Practice of Statistics. W.H. Freeman.
  • Newbold, P., Carlson, W. L., & Thorne, B. (2010). Statistics for Business and Economics. Pearson.
  • Agresti, A., & Franklin, C. (2013). Statistics: The Art and Science of Learning from Data. Pearson.
  • Wasserman, L. (2004). All of Statistics: A Concise Course in Statistical Inference. Springer.
  • Hogg, R. V., & Tanis, E. A. (2009). Probability and Statistical Inference. Pearson Education.
  • Rice, J. A. (2007). Mathematical Statistics and Data Analysis. Cengage Learning.
  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd.
  • Schervish, M. J. (2012). Theory of Statistics. Springer.
  • Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences. Routledge.