In 1897, an eight-year old Virginia Hanlon posed the question to the New York Sun – is there a Santa Claus? While the editors of the Sun could have easily disregarded and dismissed the child’s simple question, instead they took the opportunity to address the simple question metaphorically and inspirationally in a way that has impacted the American view of Christmas for over a century.
In the 13 years that I have been involved with Predictive Analytics, Machine Learning and Data Mining, I have been told countless times that there is no Santa Claus in this field. Industry experts, leaders in the fields, scientists and practitioners alike have all told me that this technology is simply out of reach for mere mortals. I’ve had a general manager of a prominent data mining product tell me that people simply can’t make good models. I’ve seen an analytics chief at a major American corporation tell his colleagues that they simply weren’t smart enough. I’ve seen a top level scientist at a software company claim that machine learning is just too hard for any but the most highly trained and sophisticated to comprehend.
The reasons that are given why predictive analytics is out of reach are generally hard to argue with. People will make mistakes. They won’t build good models. They don’t understand the complexity involved. They don’t understand the algorithms. They can’t see how data problems impact accuracy. Their models won’t be accurate. These are all good reasons why most people cannot perform predictive analytics.
But, so what.
The reasons may even be valid. But again, so what. Mistakes will be made – every day millions of men, women and barely children manipulate tons of steel at high speeds in congested, complex environments risking their own lives and those of everyone surrounding them, and yet, we all do it. We drive every day knowing risks and taking them. Mistakes are made, and people even die, but we still do it every day. Hypothetical “sky is falling” scenarios can be imagined where using predictive technology can cause mass destruction, but how is that different than transporting hazardous waste or other toxic materials – if the risk is higher, you add more diligence. The fact that some individual mistakes can be costly doesn’t negate the net commerce value of dispersing the technology to a wide audience.
The concern about not building “good” models isn’t even about building good models – it’s about building the “best” model. Relating to the automobile metaphor, current industry practice is that every model must be the equivalent of a F1 race car. It’s true that there are cases where the small improvement in accuracy of a highly tuned model is worth millions of dollars in results. However it’s more true that most enterprises aren’t taking advantage of predictive analysis at all, and the initial large improvement over nothing is far more valuable than the incremental value of “best”. Most people don’t have $500,000 cars that perform well at 200 mph on a special track – we’re satisfied with $30,000 cars that have an average speed of 35 mph in a variety of environments. If you built a model that improved your business by 50% by showing you information you never knew before, does it matter that there could have been a model that improved it 55%, or even 200%? It doesn’t if you could never have built a model in the first place.
All the arguments about complexity and understanding – they’re right. Predictive Analytics and Machine Learning are complex. But, as I said earlier, so what. Many of the issues that make this technology overly complex are simply design and implementation choices made by scientists for scientists. Many of the most beloved wonky technical features in products in this space exist only to overcome specific limitations of those very same products. These limitations became so accepted that the users of the products simply accept them as rites of passage to gain the value of the technology. If all of our cars had fog-free windshields, we wouldn’t need defoggers in our cars. Similarly, if predictive analytics products consumed data in the formats in which they were provided, they wouldn’t need complex manual data transformation operations to make up for their lack of functionality. If there are specific data problems that impact the behavior or performance of individual algorithms, why aren’t these problems handled by the system? Most of them can be, and eliminate the complexity of the issues.
The naysayers themselves fall into two camps, the defeatists and the protectionists. The defeatists believe that the innovation required to make predictive analytics accessible is too difficult and can’t imagine what changes can be made in order to further the individual reach of the technology – either that, or they are preoccupied with applying predictive analytics in other ways and aren’t interested. The protectionists on the other hand are threatened by the widespread adoption of predictive technology. They view the historic complexity of predictive analytics as a barrier to entry that gives them power, makes them more marketable. The overly sophisticated tricks of the trade give them a leg up that provides leverage for them for position and price. However, the protectionists are dinosaurs. They are the guys who knew how to run jobs on the mainframe when the PC came along. They are happy and protective of their little piece of territory without realizing how much more land they could grab by inviting more people to play in their playgrounds.
My vision for Predixion Software is to open up the playground and invite everybody in. I believe that anyone who has data – and who doesn’t these days – can use predictive analytics to transform and improve their business. If you can’t use predictive analytics because it’s inaccessible or too difficult, it’s not your fault – it’s my fault for not making it approachable and convenient enough for you. Already we have users who are business and marketing analysts, IT folks, etc. – not scientists – who are using Predixion Insight to change how they see their business, their customers, and their users and are making day-to-day decisions that have material impact on how they do things. It’s my mission to create the software platform that will enable the creation, deployment, and dissemination of predictive analytics that will fuel the next big data generation. We’re not there yet, but we have the plan, the desire, the imagination, the determination and the investors to make it happen.
So, in short, I want to say – Yes, Virginia! You can do Predictive Analytics!