LEAN SIX SIGMA WORLD CONFERENCE
Easing into Big Data: From Logistic Regression to Cart
Presenter: Kristine Nissen Bradley, Principal, Firefly Consulting, Austin, Texas, USA
Keywords: Big Data, Logistic Regression, Data Analysis
Industry: Career Development
From science to sports, from manufacturing to healthcare, companies in all industries are discovering that their data can be used to better understand their customers, improve quality outcomes, and build more efficient internal processes. Those of us in the process improvement world have a long history with data, process, and business understanding. This presentation will build upon that foundation by exploring machine learning approaches that have become prevalent as larger data sets have become available. First, we’ll look at the basics of the classical statistical tool, logistic regression, and how to perform and interpret the results. We will then explore how to set up and run the machine learning tool, Classification and Regression Tree (CART) analysis using the same data set so we can compare the two methods. If you haven’t historically used it, logistic regression should be the tool you add first to your kit – but it is not for the faint at heart. It is typically covered at the Master Black Belt level of Lean Six Sigma training, because, though easy to use, it is a little difficult to interpret. However, it can be very useful when trying to predict an outcome with two categories (pass or fail, buy or not buy). Some example applications include: predicting the likelihood a consumer will accept or reject a credit card offer, quantifying the odds of hospital readmission upon discharge, or predicting product failure based on upstream sensor readings. Most statistical software has this tool included, so it won’t require a new application to use it. A machine learning technique called Classification and Regression Trees (often generically called Decision Trees) can be an easier-to-interpret analysis, assuming you have software to help you do the work. Most data sets that can be analyzed with logistic regression can also be modeled using CART analysis. The output of logistic regression is typically a table of estimates, standard errors, z statistics, p- values, odds ratios, etc. The output of a CART analysis is a visualization of how the data set splits into the binary output categories. Both methods have strengths and weaknesses which we will briefly cover, but the focus of the presentation is to demonstrate how to use the tools using an anonymized, publicly available set of data. More and more, industries are looking to their data to better understand their customers, improve quality outcomes, and build more efficient internal processes. The good news is that process improvement practitioners already have foundational tools to work in this space. With a long history of working with data, a strong business understanding with a heavy process focus, and analytic skills, continuous improvement professionals are well positioned to take advantage of the evolving big data landscape.