The AI Race for Big Data Sets
Mobileye and Tesla continue their war of words over why their relationship ended with Mobileye recently claiming that they felt Tesla released its auto pilot feature too soon and put people at risk. I personally don’t have an answer in regards to what constitutes “too soon”, but what I find to be a much more interesting question is: If it was too early then why did Tesla release it?
Elon Musk is no dummy and that’s for sure so he must have had a reason. Of course he realized that by releasing the software early he ran the risk of someone dying, the resulting bad press, and potential lawsuits and tougher regulations. Or in other words, exactly what happened. Hard to believe that Elon Musk would do that simply to juice sales.
So I ask the question again: Why did he do it? In my humble opinion he did it in a bold move to catch up to and potentially pass Google and others who had been working on self driving cars for years. He did it because he needed to exponentially increase the data sets for his AI team to train their software against and he saw an opportunity to release it to thousands of drivers in order to leverage them for that much needed data. Google for years has been building maps and for some time has had drivers driving around gathering data. The only way to beat that would be to have thousands of drivers driving hundreds of thousands of miles every month with every little piece of data being sent back to Tesla.
The undergirding technologies of Artificial Intelligence require data sets to train their algorithms against. The bigger the data sets, the more robust the data, the better the algorithms. One of the reasons that AI is becoming so robust so quickly is because of the ever increasing amount of data that AI scientists have access to. Facial recognition, a historically very tough AI problem, is now almost solved because of the sheer amount of photos and pictures available on the web because of social media. Facial and image recognition algorithms have gotten better because they have such huge data sets to train against.
Here is how it used to work: Years ago I was at an AI start-up focused on the enterprise. In order to test our algorithms we had to first obtain a corporate client and then obtain access to data contained in their database – emails, legal documents, etc. We then had to build a taxonomy or knowledge tree and then physically tag each of those documents to a node on that tree. After tagging some documents we would test the algorithm and then tag more documents and then test the algorithms again and so on and so forth.
Laborious work and it only resulted in mb’s of data to test against!
That proved to be extremely difficult so the next wave of AI focused on analyzing and training against “unstructured data”, or rather, web based data because it was easily accessible and offered huge amounts of data to test against. We are now entering a new wave and that is circling back to the enterprise and accessing their data. Made possible of course because of things like Hadoop.
And in this space companies like IBM Watson and recently Salesforce are pursuing the same strategy as Tesla except focused on the enterprise. The trick is that by entering the market early Salesforce is going to be able to leverage the vast amounts of its customers past, present and ongoing data to build an ever smarter AI intelligent software which they then will be able to charge their customers more money for.