In this series we have been looking at the principles of DMAIC. Last issue we decomposed Control. This concludes our look at DMAIC. We will now turn our focus to the tools Six Sigma and Lean Practitioners use.
Cause and Effects
We have learned that effects (Out-Puts, Y's) are a direct result of the causes (In-Puts, X's). The relationship of Y = X1, X2, Xk or said Y is equal to the function of the X's is the engine that drives Six Sigma Methods. It is clear that we cannot change Y's without manipulating the X's that make the Out-Puts what they are.
Therefore, it is critical to clearly understand the X / Y relationships. To gain a favorable change in Y, we change the X's that are the most powerful drivers. There are many statistical tools available to understand the nature of these relationships.
Regression
When we have a continuous X and a continuous Y, Regression is the tool of choice. Continuous data is variables, that is to say we have actual measurements. The Y is the Response and the X is the Predictor. There is such a thing as Multiple Regression where we can assess the effect of several X's on a singular Y, but for this article we will focus on a single X and single Y.
We are not going to examine the math model used in Regression in this article. There are many fine statistics books that can do this. Regression begins with a scatter plot of X / Y relationships on a single graph. A best fitted line is drawn using the least squares method through the data. If we have a perfect X / Y relationship, all the points will fall on the line. An example of Regression would be, if I step on the gas peddle, the car goes faster; I think there is a relationship. Therefore, if I manipulate the peddle (X) I will effect a predictable change in speed (Y).
Correlation
In reality all the data points will not fit on the Least Squares line. The greater the data is scattered around the line, the weaker the relationship. There comes a point where the scatter is not adequately described by the line. This is where the ability to predict effects on Y is no longer reliably associated with changes in X.
There are several reliable methods for assessing correlation. The two most popular methods are the "Factor" and "r Factor" (Pearson Correlation). Pearson Correlation uses a model that ranges from Zero to -1 or + 1, depending on the direction of scatter. The closer the model approaches -1 or +1, the stronger the correlation. Numbers larger than .6 are generally considered correlated. The "R Factor", or sometimes referred to as R Squared, is based on the percentage of points described by the Least Squares Line.
Conclusion
Regression is a powerful tool for determining cause and effects relationships. The strength of the Regression Model can be further understood by examining "Fits and Residuals". Explanation of this technique goes way beyond the scope of this article.
Regression can be used as a predictive model to determine what changes in Y can be expected as the X is manipulated in a controlled fashion. If predictions are made outside the sample space the necessary assumption is that nothing in the environment has changed otherwise the prediction can be spurious.
I hope I have sparked some interest in this powerful tool that will lead to further study. KAVON International, Inc. uses Regression and Correlation to assist clients in making breakthrough improvements.