In that we have completed a close look at DMAIC, we began three issues ago a review of some of the Black Belt Tools. In this issue we shall look at the matter of regression.
The Road Map:
- MSA (Measurement System Analysis)
- Stability (Use a passive control chart)
- Normality (symmetry, use the Anderson Darling Test)
- Co-equal Variances (use Bartlett's or Levene's test)
- With gates 1 thru 4 in order the data can be used for analysis with confidence in the results.
As we have been discussing, the gates 1 thru 4 listed above, are necessary to validate the data being in order for the more complex analysis intended. In order to harvest meaningful results the data must be stable, normal, and if tests about the means are called for, co-equal variances are required.
Last issue we looked at correlation. This is the ancestor to regression. If two or more sets of data have passed correlation, then predictive models can be created. That is to say, a change in one factor can accurately predict a change in another. This can be most useful.
We call these knob variables; if I turn the knob I know exactly what will happen. If I step on the gas pedal the speedometer goes up. I would say they are correlated. If I measure my child's height for the first six years can I predict how tall he will be at age 45? If I tried that he would appear to be over 30 feet tall at age 45. The failure here is the growth rate changed after age 6. This is called predicting outside the model. The model changed, the growth rate slowed down, and then stopped. The model was from birth to age 6, predicting outside the model is risky and is based on the presumption that nothing changes. However, prediction within the model can be quite accurate, provided the data is correlated, as we learned in the last issue on Correlation.
This predictive model is called regression. The scope of the calculations and math involved in regression fills chapters in statistics and math books. We will not go there in this article. Software like Minitab allows us to bypass the math and get right to the results. Knowing how to collect the data, pass it through the gates, and determine its acceptability for predictive modeling is very important. Leave the complicated math to Minitab.
Regression begins by plotting the data from 2 or more correlated continuous variables on a scatter graph. Then a best-fit line is included. The process of fitting the line is achieved using something called the "least squares method". Minitab will give you the R Squared value, which indicates the percentage of points described by the line. This is another good indication of the strength of the model.
If the model is strong, than a change in either one of the variables can accurately predict the anticipated change in the other. If more than 2 variables are being analyzed, then something called "multiple-regression" is used.
Regression will only work with "continuous data" for all the data sets involved. I have not provided sufficient information in this short article to make you clearly understand regression, but I only intended to generate a basic familiarity of the concept. Predictive modeling is very important and we spend a lot of time instructing our Six Sigma Black Belt Students in the tool.