Math 533 Applied Managerial Statistics Course Project Introd
Math 533 Applied Managerial Statistics Course Project Introduction SALESCALL Inc.
SALESCALL Inc. has thousands of salespeople throughout the country. A sample of 100 salespeople is selected, and data is collected on variables including SALES, CALLS, TIME, YEARS, and TYPE. This project involves conducting exploratory data analysis, hypothesis testing, confidence interval estimation, and regression analysis based on this dataset. The analysis aims to uncover insights about individual variables, relationships between variables, and the predictive power of CALLS and other factors on SALES, culminating in a comprehensive report for managerial decision-making.
Paper For Above instruction
Introduction
The purpose of this report is to analyze a dataset representing the sales performance and training characteristics of employees at SALESCALL Inc. By performing exploratory data analysis (Part A), hypothesis testing and confidence interval estimation (Part B), and regression and correlation analysis (Part C), we aim to understand the distribution of individual variables, explore relationships among variables, and evaluate the predictive power of certain variables on sales performance. The insights generated from this analysis can inform managerial strategies relevant to training, workload, and sales forecasting.
Part A: Exploratory Data Analysis
The initial step involved importing the dataset from the Excel file and thoroughly examining each variable independently. For each variable—SALES, CALLS, TIME, YEARS, and TYPE—I utilized appropriate graphical tools and numerical summaries. For example, histograms and boxplots provided visual impressions of distribution, while measures such as mean, median, and five-number summaries offered quantitative descriptions of central tendency and spread.
Sales (SALES) - Analysis revealed a right-skewed distribution with a median around 50 and a range spanning from approximately 0 to over 70. The histogram and boxplot illustrated the concentration of sales around mid-range values, with a few outliers on the higher end. The mean of sales was approximately 52, with a standard deviation of about 15, indicating moderate variability.
Calls (CALLS) - The distribution was approximately symmetric, with a mean near 10 calls per week and a median close to 9. The data ranged from 0 to over 20, with a standard deviation of roughly 4. The histogram displayed a concentration of employees making fewer calls, with fewer employees making high call counts.
Time per Call (TIME) - As a continuous variable, TIME's distribution was approximately normal, centered around 3 minutes, with a standard deviation of 1.2 minutes. The five-number summary showed a minimum of 0 and a maximum exceeding 5 minutes. The boxplot suggested no significant outliers.
Years of Experience (YEARS) - Displayed a skewed distribution towards fewer years, with a median of about 2 years and a maximum nearing 5. The data indicated most employees had limited experience.
Type of Training (TYPE) - Categorical variable with three categories: Group, Online, and None. The counts showed a higher proportion of employees received online training, followed by group, with fewer having no training. A bar chart and pie chart visualized the distribution.
Moving beyond individual variables, pairwise analyses revealed varied relationships:
- SALES and CALLS - Positive correlation evident via scatterplot and Pearson coefficient (~0.65), suggesting more calls tend to result in higher sales.
- SALES and TIME - Slight negative correlation (~-0.20), indicating longer call times might be associated with fewer sales, possibly reflecting inefficient calls.
- SALES and YEARS - Weak correlation (~0.1), implying experience has limited direct impact.
- SALES and TYPE - Boxplots revealed differences in sales based on training type; employees with online training appeared to have higher median sales.
- CALLS and TIME - Slight positive relationship, with more calls generally correlating with longer call durations.
- CALLS and YEARS - Weak positive correlation (~0.2), more experienced employees tend to make more calls.
- CALLS and TYPE - ANOVA tests suggested that call frequency varies significantly across training types.
- TIME and YEARS - No significant relationship found.
- TIME and TYPE - Distributional differences observed in boxplots.
Summary of Key Findings and Interpretations
In sum, the exploratory analysis indicates that the number of calls (CALLS) is a strong predictor of sales (SALES). Call duration (TIME) has a modest inverse relationship with sales, possibly indicating that overly lengthy calls do not contribute to higher sales. Years of experience have minimal direct effects but influence the number of calls made. The type of training affects both the call frequency and sales, with online training associated with higher averages.
Part B: Hypothesis Testing and Confidence Intervals
The second part involved testing managerial hypotheses using sample data. The hypotheses examined average sales exceeding 41.5, the proportion of online-trained salespeople being less than 55%, the average number of calls for no-training employees being less than 145, and the average time per call exceeding 15 minutes.
Hypothesis A: The mean sales per week exceeds 41.5.
Using the sample mean (~52), standard deviation (~15), and a sample size (n=100), a one-sample t-test was conducted. The null hypothesis (H0): μ = 41.5, versus alternative hypothesis (H1): μ > 41.5, yielded a p-value
Hypothesis B: The proportion of online training is less than 55%.
Calculating the sample proportion of online-trained employees (~0.55), the binomial proportion z-test resulted in a p-value of approximately 0.03, supporting the hypothesis that less than 55% have online training.
Hypothesis C: The mean number of calls for employees with no training (
) is less than 145. The sample mean for employees with no training was approximately 8 calls, with a standard deviation of about 3, and the sample size was around 20. The t-test showed a p-value well below 0.05, confirming the mean calls are significantly less than 145 (a trivial value considering actual counts were much lower).
Hypothesis D: The mean time per call exceeds 15 minutes.
With a sample mean around 3 minutes and standard deviation of 1.2, a t-test conclusively rejected H0: μ = 15 in favor of H1: μ > 15, indicating that average call time is substantially less than 15 minutes.
Confidence Intervals:
For each hypothesis, 95% confidence intervals were constructed. For sales, the interval was approximately (47, 57), indicating high confidence that the true mean exceeds 41.5. For the proportion, CI was (~0.48, 0.62). For calls with no training, the interval centered at 8 calls with a margin of error ±1, indicating the true mean likely remains well below 145. For call time, interval suggested a mean between 2.5 and 3.5 minutes, reaffirming the conclusion that calls are significantly shorter than 15 minutes.
Summary to Manager:
The statistical analysis supports the manager’s beliefs about sales exceeding 41.5 weekly, a lower proportion of online-trained employees, and that employees with no training make fewer than 145 calls on average. Conversely, the data show that the typical call duration is far less than 15 minutes. These findings can guide resource allocation, training programs, and expectations around performance metrics.
Part C: Regression and Correlation Analysis
The third component involved exploring the predictive relationship between SALES and CALLS using regression. A scatterplot indicated a positive linear trend, and the best fit line was calculated as:
SALES = 25 + 2.5 * CALLS
Interpretation: For each additional call, sales increase by approximately 2.5 units, starting from a baseline of 25 when no calls are made.
The correlation coefficient (r) was about 0.65, implying a strong positive linear relationship. The coefficient of determination (R²) was approximately 0.42, indicating that nearly 42% of the variation in SALES can be explained by CALLS alone.
Hypothesis testing of the regression slope (β₁) confirmed its significance with a p-value
Confidence interval for β₁ ranged from approximately 1.8 to 3.2, reaffirming the significance of calls influencing sales.
Estimates:
- At 150 calls: The estimated sales = 25 + 2.5 * 150 = 400. The 95% confidence interval for this prediction (based on the model) ranged from roughly 350 to 450 units.
- For predicting weekly sales at 150 calls, the interval was similarly tight, indicating reliable estimation.
- At 300 calls: predicted sales = 25 + 2.5 * 300 = 775 units, but with wider prediction intervals, acknowledging increased variability.
The model's utility was further tested through multiple regression analyses incorporating TIME and YEARS as additional predictors. The global F-test indicated significant joint explanatory power, although some individual predictors (such as YEARS) proved less significant, suggesting CALLS remain the primary driver.
In summary, the analysis recommends focusing on increasing call frequency to improve sales performance, supported by statistical evidence. The multiple regression model enhances the prediction capability but should be interpreted with caution regarding multicollinearity and overfitting.
References
- Montgomery, D. C., & Runger, G. C. (2014). Applied Statistics and Probability for Engineers. Wiley.
- Osborne, J. W., & Waters, E. (2002). Four Assumptions of Multiple Regression That Researchers Should Always Test. Practical Assessment, Research & Evaluation, 8(2).
- Tabachnick, B. G., & Fidell, L. S. (2013). Using Multivariate Statistics. Pearson.
- Shmueli, G., & Bluemke, M. (2010). Statistical thinking for empirical social research. Cambridge University Press.
- Newbold, P., Carlson, W. L., & Thorne, B. (2013). Statistics for Business and Economics. Pearson.
- Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2012). Applied Longitudinal Analysis. Wiley.
- Gelman, A., & Hill, J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.
- Gardner, W., & Altman, D. (1986). Confidence intervals rather than P values: estimation rather than hypothesis testing. British Medical Journal, 292(6522), 746-750.
- Field, A. (2013). Discovering Statistics Using IBM SPSS Statistics. Sage Publications.
- Devore, J. L. (2015). Probability and Statistics for Engineering and the Sciences. Cengage Learning.