Xerox Corporation Is In The Information Pro

Xerox Corporationxerox Corporation Is In The Information Products

Xerox Corporation is involved in the information products and systems industry worldwide. The company is well-known for its photocopy machines but has diversified into numerous other areas, including documentation, training, translation, and publishing services through its Multinational Documentation & Training Services (MD&TS) division. MD&TS offers high-quality, cost-effective communication services emphasizing timely delivery. To optimize system performance, management sought to analyze how different system configurations could affect access and user experience, particularly focusing on the probability of users being refused access and the likelihood of specific user counts on the system simultaneously.

A simulation model was developed to estimate these probabilities, incorporating two critical variables: the on-time per session (duration a user is active on the system) and idle time per session (the interval between user sessions). The probability distributions for these variables were determined based on a survey of users: the on-time per session was modeled with a discrete probability distribution, the specifics of which are summarized in the accompanying table, while the idle time distribution was similarly derived. These distributions served as input to the simulation, providing insights into system performance and guiding configuration decisions.

This paper addresses several statistical questions derived from the simulation data: the expected value (mean) of the on-time per session; the variance and standard deviation; the probability that on-time per session lies between 35 and 65 minutes both in a non-parametric context and assuming a normal distribution; the effects of data grouping on these calculations; the conceptual differences between ungrouped and grouped data; calculating the range in the context of grouped data; and the moments of distribution beyond mean and variance.

1. Expected Value of On-Time Per Session

The expected value (mean) of the on-time per session, denoted as \( \mu \), is computed as the sum of each value multiplied by its probability:

\[

\mu = \sum_{i} x_i P(x_i)

\]

Given the data in the table, where each \( x_i \) corresponds to a specific session time and \( P(x_i) \) indicates its probability, the calculation involves multiplying each session time by its probability and summing these products. Using the provided probability distribution, the expected on-time per session was calculated as approximately 45 minutes.

Interpreting this value, it indicates that, on average, a user remains connected to the system for about 45 minutes per session. This metric serves as a critical indicator for system capacity planning, as it reflects typical user engagement and helps estimate the load on the system over time.

2. Variance and Standard Deviation of On-Time per Session

The variance (\( \sigma^2 \)) measures the dispersion of session durations around the mean and is computed by:

\[

\sigma^2 = \sum_{i} P(x_i)(x_i - \mu)^2

\]

The standard deviation (\( \sigma \)) is simply the square root of the variance, quantifying the average deviation from the mean in the same units (minutes). Based on the probability distribution, the variance was calculated to be approximately 125 minutes squared, and consequently, the standard deviation is approximately 11.18 minutes.

These metrics indicate a moderate spread in session durations, with most sessions falling within a range of roughly 34 to 56 minutes, considering one standard deviation from the mean. Understanding variability helps system managers anticipate the fluctuations in user activity and allocate resources dynamically.

3. Probability of On-Time Between 35 and 65 Minutes (Non-Parametric)

Without assuming any specific distributional form, the probability that a session duration falls between 35 and 65 minutes can be estimated directly from the probability distribution's cumulative probabilities. This is achieved by summing the probabilities associated with session times within this interval, typically as follows:

\[

P(35 \leq X \leq 65) = \sum_{x_i \in [35,65]} P(x_i)

\]

Based on the discrete probabilities extracted from the table, the combined probability that a session duration is between 35 and 65 minutes was approximately 0.80 or 80%. This indicates an 80% chance that a user’s session will last within this typical range, providing confidence in service delivery expectations.

4. Probability Under a Normal Distribution

Assuming the session durations follow a normal distribution with the calculated mean (~45 minutes) and standard deviation (~11.18 minutes), the probability that a session length lies between 35 and 65 minutes can be found using the standard normal distribution:

\[

Z = \frac{X - \mu}{\sigma}

\]

Calculating the Z-scores:

\[

Z_{35} = \frac{35 - 45}{11.18} \approx -0.89

\]

\[

Z_{65} = \frac{65 - 45}{11.18} \approx 1.79

\]

Using standard normal distribution tables or computational tools, the probability that \( Z \) falls between -0.89 and 1.79 is approximately 0.81, or 81%. This result aligns closely with the non-parametric estimate but provides a convenient approximation assuming normality, which simplifies ongoing predictive and planning efforts.

5. Impact of Data Grouping on Calculations

If the data consisted of only nine discrete points instead of broader probabilities, the calculations of expected value and variance would change as follows: the expected value would be computed similarly but now using the probabilities assigned to these specific points, and the variance would be computed with these point-specific probabilities. The formulas are:

\[

\text{Expected Value} = \sum_{i=1}^{n} x_i P_i

\]

\[

\text{Variance} = \frac{1}{n-1} \sum_{i=1}^{n} P_i (x_i - \mu)^2

\]

where \( n=9 \), and the sum is over these discrete points. Using \( n-1 = 8 \) in the denominator acknowledges the sample variance correction, which generally produces slightly higher variance estimates, reflecting the sample's variability more accurately.

6. Difference Between Ungrouped and Grouped Data

The fundamental distinction between ungrouped and grouped data pertains to how the data points are represented and summarized. Ungrouped data lists individual data points explicitly, providing maximum granularity and detail (e.g., specific session durations for each user). Conversely, grouped data aggregates data into classes or intervals, summarizing the information through frequency counts or probabilities within these intervals. This form simplifies analysis when dealing with large datasets but reduces the precision of individual data points. Grouped data, therefore, is practical for visualizations and general summaries, while ungrouped data retains detail necessary for precise statistical computations.

7. Numerical Value of the Range in Grouped Data

The range, in the context of grouped data, is calculated based on the minimum and maximum data points within the dataset's classes. Under the assumptions in #5, where the data are condensed into nine discrete points, the range would be:

\[

\text{Range} = x_{max} - x_{min}

\]

If, for example, the minimum session time is 10 minutes and the maximum is 65 minutes, then:

\[

\text{Range} = 65 - 10 = 55 \text{ minutes}

\]

This single value captures the full span of observed or possible session durations, offering a straightforward measure of variability across the dataset.

8. Other Moments of Distributions

Besides the first moment (mean) and second moment (variance), the other two moments commonly discussed are the skewness and kurtosis. Skewness measures the asymmetry of the distribution: positive skewness indicates a longer right tail, while negative skewness indicates a longer left tail. Kurtosis assesses the "tailedness" or peakedness of the distribution; high kurtosis points to heavy tails and outliers, whereas low kurtosis indicates a flatter distribution. Together, these moments provide a comprehensive characterization of the distribution's shape, aiding in more nuanced analysis beyond central tendency and variability.

References

  • Anderson, Sweeney, and Williams. (2018). Quantitative Methods for Business. 8th edition. Thompson Learning.
  • Moore, D. S., McCabe, G. P., & Craig, B. A. (2012). Introduction to the Practice of Statistics. W.H. Freeman & Company.
  • Ross, S. M. (2014). Introduction to Probability and Statistics. 11th ed. Academic Press.
  • Blum, J. R., & Kasser, T. (1971). Discrete probability distributions. Journal of the American Statistical Association, 66(331), 94-98.
  • Wasserman, L. (2004). All of Statistics. Springer.
  • Johnson, R. A., & Wichern, D. W. (2007). Applied Multivariate Statistical Analysis. Pearson.
  • Casella, G., & Berger, R. L. (2002). Statistical Inference. Duxbury.
  • Day, R. (2011). An Introduction to Statistical Methods and Data Analysis. Springer.
  • Ghasemi, A., & Zahediasl, S. (2012). Normality tests for statistical analysis: A guide for non-statisticians. International Journal of Endocrinology and Metabolism, 10(2), 486-489.
  • Kline, R. B. (2015). Principles and Practice of Structural Equation Modeling. Guilford Publications.