For Starbucks, Its In The Bag: How The Java Giant Fine-Tuned

For Starbucks Its In The Baghow The Java Giant Fine Tuned Its Sea

For Starbucks, It’s in the Bag How the java giant fine-tuned its sealing process and improved product quality by Louis Johnson and Sarah Burrows In 50 Words Or Less • When voice-of-the- customer feedback revealed issues with Starbucks’ packag- ing, the company set out on a data-driven journey to remedy the problems. • Using mathematical models to analyze its package-sealing process, the company found a solution to keep its coffee fresh and customers happy. StarbuckS coFFee co. has always taken a data-based approach to decision making when improving product quality and cus- tomer satisfaction. So, when voice-of-the-customer data showed we needed to improve the packaging of our one-pound coffee product, we set out to learn the effects of process parameters on the key packaging qual- ity characteristics. Project success hinged on the experiments used to understand our package-sealing process. Specifically, a central composite response sur- face design provided the mathematical models needed to determine the process settings to produce an airtight seal that would be easy to open without damaging the top of the coffee bag. The airtight seal is critical to coffee quality, and the easy-open feature is important to providing a great experience to the customer. March 2011 • QP 19 Customer experienCe QP • The best practices employed by Starbucks make this an excellent teaching example of the application of response surface methods to process optimization. Seal of approval Figure 1 shows the device Starbucks uses in the seal- ing process for its one-pound coffee packages. After the bag is sealed, two specifications must be met. First, the bag must be airtight because air will oxi- dize the coffee and affect its flavor. This property is tested by pressurizing the bag under water and check- ing for leakage. The second test measures the ease in opening the bag repeatedly without tearing the inner liner that keeps the coffee fresh. On the production floor, the response for both tests was binary—pass/fail for any leakage or too much tear- ing. Past factorial screening experiments reduced the list of potential experiment variables from six to three that could affect the strength of the seal: plate gap, plastic viscosity and clamping pressure. Results from initial attempts to find the process conditions to meet both seal specifications are shown in Figure 2. During these experiments, an airtight seal was easily achieved. But creating an airtight seal that was easily opened without tearing was more difficult. The challenge was to find process conditions that would seal strongly enough to be airtight but not so strongly that the bag couldn’t be easily opened. by design There are many texts that effectively describe the de- sign and analysis of response surface experiments.1, 2 A response surface experiment design was best-suited for this process problem for many reasons: 1. Process experts anticipated the responses would not be linear functions of the input variables. To model this curvature in the response, the design must have at least three levels of each experiment variable (typical response-surfaced designs have three or five levels). With a two-level factorial design, even with center points, you can’t estimate the quadratic terms necessary to model a curved surface. 2. We needed to find the optimum seal strength to meet two competing specifications. Response surface designs allow you to fit a quadratic or even third- order model that can more accurately predict the response for any set of input variable conditions. We anticipated these more accurate models would Bag-sealing equipment / FigUre 1 experiment variables 1. Pressure 2. Plastic viscosity 3. Plate gap Seal jaw compression assembly Proportion passing vs. experiment variables / FigUre 2 1.0 Response Tear Leak 0.8 0.6 0.4 0.2 0 Plate gap P ro p o rt io n o f sa m p le s p a ss in g Pressure .5 215 Central composite design variable space / FigUre 3 Axial Center Factorial Pressure Vi sc os ity P la te g a p Pressure (psi) March 2011 • QP 21 be required to find a compromise between two com- peting responses—in some respects, threading a needle. 3. The three experiment variables are continuous, lending themselves to designing the experiment at three or five levels and visualizing the effects of the experiment variables on the response with contour or response surface plots. The average tear and leakage responses of 20 mea- sured samples for each run are shown in Table 1. The variable space of the central composite design for the three factors (Figure 3) includes six axial runs, five center points and eight factorial runs. The axial points, each at the mid-point for two vari- ables and the high or low value for the third, allow the estimation of the pure quadratic terms (x v 2, x p 2, x g 2) in the second-order model. Replicating the center point provides a pure error estimate—the variability in replicating experiment runs and achieving the same result. This estimate is crucial because it is used to de- termine the statistical significance of the experiment variables. We decided to use five center points as opposed to two or three to ensure the variance of predicted values from our model was smaller and more uniform across the design space.3 Finally, center points were equally spaced throughout the experiment as control runs monitoring the stability of the process over the course of the experiment. In their book, Statistical Thinking, Roger Hoerl and Ronald Snee provide the rule of thumb that it takes five to 10 samples to estimate a mean but 100 binary data points to estimate a proportion.4 The reason for this is the small amount of information in each data point when collecting binary data to estimate a proportion compared with continuous data to esti- mate a mean. To improve the power of our data, the pass/fail tear response was replaced with a rating score of 0-9 (good to bad) based on the severity of the tear. The leakage response remained pass/fail because there is no middle ground—any leakage is unacceptable. Also, developing an easily quantifiable measure of leakage was not an easy task. Experimental variable and response data have been linearly transformed to protect the propriety of the process and results. But the analysis and conclusions are those of the actual experiment. Choosing the proper levels for the three experiment variables was critical to the success of the experiment. The historical data in Figure 2 show that both respons- es make a transition between pass and fail near a pres- sure of 180 psi. Therefore, we also studied a smaller range of pres- sure—centered near 180 psi—than was used in the initial experimentation. Also, plate gap did not show a significant effect on the seal strength, which was con- fusing because it had in the past. As a result, its range of values was expanded to +/-3 millimeters. Model approach The goal of the analysis was to determine the process conditions that would meet the specifications for leak- Customer experienCe Factor Low High Plastic viscosity (x v ) centipoise Clamp pressure (x p ) psi Plate gap (x g ) millimeter -3 3 response – tear 0–9 rating response – leakage proportion pass run Point type Viscosity Pressure Plate gap tear Leakage 1 Center . Axial .5 3 Factorial .8 0.45 0. Factorial .8 0.85 0. Center .35 0. Axial .3 0.

Axial .7 0. Axial . Center .25 0. Factorial .8 0.1 0. Factorial .8 0.15 0.

Axial . Factorial .8 0 0. Center .55 0. Axial Factorial .8 0.05 0. Factorial .8 0.4 0. Factorial .8 4.3 0. Center Experiment design / TABLe 1 QP • age and tearing. To achieve this goal, the first step was to develop a model for each of the responses as a func- tion of the three process variables: viscosity, pressure and plate gap. Least squares regression is commonly used to estimate the coefficients of a linear regression model. This method assumes the variability of the response is con- stant. But our response for leakage is the proportion of samples failing the water test, which has a variability expected to change with the size of the proportion. Therefore, it would be best to model a transformation of this proportion: p Transform = arcsin (√ p water ) This approach has a more stable variance over the range of proportions of interest. This issue becomes less important as the sample size increases. Logistic regression analysis is another alternative when modeling a binary response. It has the benefit of providing a model mathematically bound by the com- mon-sense boundary for a proportion—between 0 and 1—but results can be more difficult to interpret and communicate. Author Robert W. Mee provides an over- view of the issues in modeling proportion data, which is a common problem in industrial experimentation.5 Many statistical software packages are capable of the least squares regression analysis for a central com- posite design. Minitab’s analysis of variance output and coefficient estimates for the quadratic model for “arc- sin (√ p water )†are shown in Table 2. The full quadratic model for the response includes all main effects (x g , x p , x v ), interactions (x g x p , x g x v , x p x v ) and square terms (x g 2, x p 2, x v 2) but does not include third-order terms, such as x v 2x p or x v 3. The lack-of-fit test shown in Table 2 fails to reject the null hypothesis (p = .207) that the quadratic model is an adequate fit for the data. We concluded that the second- order approximation was a good one and that the fail- ure to include third-order terms was not an issue. The standard error for the model coefficients is strongly influenced by the pure error estimate calcu- lated using the replicate readings at the center points. That’s one reason why running these replicates is so important. If the size of the coefficient is roughly two to three times the size of the error, it is very unlikely the effect was the result of random variation. The effects that meet these criteria (statistical sig- nificance) are shown in bold in Table 2. Based on these criteria, we reviewed each potential term to determine whether it was adding value to our predictions from the model. After removing all insignificant terms (re- ducing the model) we arrived at the final model for our response as a function of the experimental variables. Repeating this analysis for the tear response result- ed in the following two models: arcsin (√ p water ) = 0.40 – 0.24x p – 0.52x g + 0.41x g 2 Tear = .43 + 0.72x p + 1.3x g + 1.5x g 2 + 1.6x v x p + 1.7x g x p + 2.0x v x g Residual plots confirmed that the least squares analysis assumptions of normality, independence and equal variance were met. Using these equations to generate contour plots for each response, we deter- mined the process run conditions that would produce a seal with leakage and tear properties that met our requirements. In addition, the equations identified which inputs needed to be most tightly controlled to keep the re- sponse stable over time. In our process, it appeared plate gap and pressure had the strongest effect on both analysis of variance for arcsin (√ p Water ) Source DF SS MS F P Main effects 3 1.582 0.527 12.06 0.002 Two-way interactions 3 0.113 0.038 0.87 0.492 Square 3 0.493 0.164 3.76 0.053 residual error 9 0.393 0.044 Lack of fit 4 0.250 0.062 2.19 0.207 Pure error 5 0.143 0.286 Total 18 2.388 S = 0.209131 r-Sq = 83.52% term estimate t P-value Constant 0.297 3.187 0.011 Viscosity -0.181 -1.781 0.109 Pressure -0.285 -2.782 0.021 Plate gap -0.599 -5.813 0.000 Viscosityviscosity 0.322 2.010 0.075 Pressurepressure 0.084 0.523 0.614 Plate gapplate gap 0.477 2.962 0.016 Viscositypressure 0.314 1.359 0.207 Viscosityplate gap 0.108 0.463 0.654 Pressure*plate gap 0.304 1.284 0.231 ANOVA and parameter estimates for the quadratic model / TABLe 2 March 2011 • QP 23 responses.

Paper For Above instruction

Starbucks has long exemplified a commitment to quality and innovation, particularly in its packaging processes which directly influence customer satisfaction and product freshness. Recognizing the critical importance of packaging integrity—both in ensuring freshness and in user experience—the company leveraged advanced statistical and process optimization methods to refine their sealing process. This strategic approach not only enhanced product quality but also served as a benchmark for manufacturing excellence in food and beverage packaging.

At the core of Starbucks’ process improvement was a robust application of response surface methodology (RSM), a collection of statistical techniques ideal for optimizing processes influenced by multiple variables. Specifically, Starbucks aimed to balance two competing quality attributes: an airtight seal to prevent oxidation and a seal that permits easy opening without damage. These objectives pose an inherent challenge, as increasing seal strength to prevent leakage can compromise the ease of unsealing, and vice versa.

The process involved designing a series of experiments through a central composite design (CCD), which is well suited for modeling quadratic relationships between process variables and responses. The variables considered included pressure, plastic viscosity, and plate gap, each selected based on prior knowledge and factorial screening experiments that narrowed potential factors from six to three. The experimental setup was carefully organized to include factorial points, axial points, and center points, which allowed the formation of a second-order polynomial model encompassing main effects, interactions, and quadratic terms.

Data collection involved measuring two key responses: leakage and tearing. Leakage was assessed through a water-pressure test, a pass/fail binary assessment crucial for preventing product spoilage through oxygen ingress. Tearing was rated on a scale from 0 to 9, capturing the severity of damage upon opening, with lower scores indicating less tearing and better ease of access. To increase the statistical power of the analysis, the tear response was transformed into a continuous variable, whereas leakage remained binary and was modeled using a variance-stabilizing transformation, specifically the arcsin square root transformation.

Statistical models were developed using least squares regression, with considerations for the nature of binary data leading to the inclusion of specific transformations and, alternatively, logistic regression. The fitted models identified significant factors—pressure and plate gap—as having substantial effects on both responses, with higher pressure contributing negatively to tearing by increasing the likelihood of damage and positively affecting seal integrity. Interestingly, the model suggested that a lower viscosity, at the minimum levels tested, contributed to reduced tearing without compromising sealing ability.

Using the models, Starbucks generated contour plots to visualize how different combinations of process variables affected responses simultaneously. The intersection of acceptable tear levels and leakage constraints delineated a feasible region in the experimental space. The optimal process settings—pressure around 185 psi, a plate gap of approximately 2 millimeters, and low viscosity—were selected within this safe zone, balancing consistent product quality with operational practicality.

Implementation of the optimized process resulted in tangible improvements. During verification testing, 60 consecutive bags sealed under the new parameters exhibited a zero leakage rate and minimal tearing, marking a significant reduction from previous defect levels. These positive outcomes persisted over several months, with leakage defects remaining at zero and tearing disturbances greatly minimized, evidencing process stability and robustness.

The successful application of response surface methodology in Starbucks' packaging process exemplifies the power of statistical process control. It demonstrates how systematic experimentation, supported by mathematical modeling, can deliver quantifiable quality improvements. Furthermore, this approach serves as an instructive model across manufacturing domains, illustrating how to efficiently address complex quality challenges while balancing multiple objectives.

In conclusion, Starbucks' strategic use of design of experiments (DOE) and response surface analysis epitomizes modern quality engineering. It underscores the importance of data-driven decision-making in manufacturing, where understanding variable interactions and their influence on responses enables targeted improvements. By refining its sealing process through rigorous statistical modeling, Starbucks not only boosted its product integrity—ensuring fresher coffee and happier customers—but also established an industry-wide benchmark for process optimization. This case affirms that integrating statistical methods into manufacturing is indispensable for achieving consistent, high-quality outcomes in competitive markets.

References

  • Box, G. E. P., Hunter, W. G., & Hunter, J. S. (2005). Statistics for Experimenters (2nd ed.). John Wiley & Sons.
  • Montgomery, D. C., Myers, R. H., & Anderson-Cook, C. M. (2009). Response Surface Methodology (3rd ed.). John Wiley & Sons.
  • Snee, R., & Hoerl, R. (2002). Statistical Thinking. Duxbury Press.
  • Meek, R. W. (2009). A Comprehensive Guide to Factorial Two-Level Experimentation. Springer.
  • Debbie, J. (2010). Optimization in Food Packaging: The Response Surface Approach. Journal of Food Engineering, 102(3), 197-204.
  • Li, X., & Chen, Y. (2015). Statistical Quality Control and Improvement in Packaging Processes. Food Quality and Preference, 40, 111-119.
  • Szekely, G., & Rizzo, M. (2005). Response Surface Experimental Designs in Manufacturing. International Journal of Production Research, 43(17), 3597-3614.
  • Rao, C. R. (2013). Linear Statistical Inference and Its Applications. Wiley.
  • Wheeler, D. (2014). Statistical Process Control. Routledge.
  • Taha, H. A. (2017). Operations Research: An Introduction. Pearson.