Contemporary Analysis

Data Science

Grant Stanley

Predictive Analytics is Not a Crystal Ball

Its common to see predictive analytics as a sort of "crystal ball" for your business. This crystal ball image makes for great marketing. Unfortunately, predictive analytics is not a crystal ball.

It will not provide the correct prediction every time. Its primary purpose is to help you make better decisions by giving you the power to unlock the patterns inside your data. When performed correctly this gives you the ability to simplify decisions. When performed incorrectly it can spell disaster for your company.

Predictive analytics is both an art and science. It requires a combination of both empirical and subjective experience to verify that models reflect reality. This is why CAN takes into consideration three main aspects when building predictive models: Data, Theory, and Math. In our experience your predictive models will not reflect reality if all of three of these aspects are not held up. 


Data is the underlying foundation of predictive analytics. If your data is bad, the patterns you uncover and generalizations you make will not likely hold up in reality. However, your data also has a limited scope, that means there will be times when you cannot predict certain outcomes.

In 2007, Nassim Nicholas Taleb released a book entitled The Black Swan which illustrated the theory around how previously unobservable events in hindsight are often inappropriately rationalized and deemed predictible. A Black Swan is synonomous to an outlier. The metaphor of the Black Swan itself was created to illustrate that idea in nature where if you had never observed the existence of a Black Swan, the probability of expecting a Black Swan to appear would be 0%. In his book, Taleb gives several examples of events considered to be Black Swan's including the personal computer, World War 1, and the September 11th attacks.

In business, there will always be outliers or situations you've experienced infrequently. To account for these situations you must update your assumptions and models frequently so you're continuing to capture changes in your competitive environment, customers, and data. If the scope of your data doesn't allow you to generalize reliable patterns, predictive analytics will do you little good. In these cases its best to use a combination of empirical nad subjective experience to rationalize your decision.


Theory is the experience you can append to the patterns in the data. Without solid experience you cannot understand the patterns in your data. This is why we engage our client's experts to understand what patterns and variables are important to analyze. This also allows the experts to validate our predictive models.

For example, knowing that young males in college are usually the least loyal for a bank allows us to pinpoint that age, gender, and educational attainment may be important predictors when building a predictive loyalty model. If we look at trends in average balances non-interest bearing accounts we might also conclude that young males in college are also bad candidates for selling CDs to. There is also the chance that this empirical evidence may refute what some experts think, which is why its important to faciliate discussions beyond what typical "rules of thumb" may be. Its not uncommon to find a few surprises in your data.


Math is the engine that allows you to scale the patterns in your data and theory so you can reduce decisions to calculations. This is also the process by which we choose the best predictive models. Data and theory provide the context for the Math. If you focus only on the Math alone it will lead you astray.

Lets reference the previous example. If you decide to build a predictive model using age, gender, and educational attainment as predictors you may find that the mathematical results indicate that age and gender are not signficant predictors of loyalty. Normally at this point you would decide to remove these variables from the model and work with what remains. However, if you had the knowledge before hand that a specific relationship may exist, such as the trend with young males, you could try creating a variable that into account both age and gender (i.e. 19-25 year old males). Knowing this little piece of information allows you to build a better -predictive model, but also requires having the context to back it up.

In summary, predictive analytics requires a combination of empirical and subjective experience. This is why CAN focuses on having the Data, Theory, and Math aligned before operationalizing a model. Data, Theory, and Math on their own will do you little good. Together they allow you to capture the most robust results. So if you're looking to engage in predictive analytics, make sure to leverage your experts on all sides to attain the best results.

Thoughts? Post a comment.


See all