I recently attended a pitch competition at MobilityX in Austin, TX, to hear eight companies share their ideas on how to use Artificial Intelligence (AI) and immersive media for urban mobility. In my last blog, I talked about the companies using Natural Language Processing (NLP) Artificial Intelligence to tackle mobility problems. In this piece, I am going to talk about the companies that tackled Machine Learning Artificial Intelligence.
Machine Learning: Decision Tree Learning Models and Neural Networks
If you want to learn more about the present state of Machine Learning (ML) and in particular its mobile implementation, you can do no better than this article about how HBO created the Not Hot Dog app from Silicon Valley.
As the article describes, Machine Learning is about systematically improving the machine output of a known input or set of inputs. Put another way, if a computer process outputs hot dog/not hot dog at 50% accuracy, a successful machine learning endeavor will increase that accuracy to say 60%. 60% isn’t all that much bang for your buck though, so I’ll plug the above article again for more insight on the journey to 90%+ accuracy.
So let’s take a simple use case and see how we might build it out using machine learning. For our use case we’ll think about a machine that can recognize fruit through image processing. For an idea of the level of complexity inherent in this problem, please check out this relevant xkcd.
To start with we’ll consider the case of trying to determine the difference between a banana and a strawberry. At our disposal is any characteristic we can think of with our fruit. In this simple example we might write some code that says if our fruit is yellow then it is a banana, and if it is red then it is a strawberry (for now we’ll ignore the non-trivial problems of how our computer knows what is fruit and not fruit in the image, and the fact that both unripe bananas and unripe strawberries can be green).
Next let’s bump up the complexity and add in watermelon. Watermelon are multi-colored, but they’re mostly green so we can use our color logic from the first phase and add in a bit about if green then our fruit is a watermelon (we’ll ignore the non-trivial problem of watermelons being red when sliced for now). But what happens if we then want to add in lemons? Now our color logic is insufficient. To that end let’s add in a second layer to our process that checks the roundness of our fruit. Lemons are pretty round, rounder than bananas at least, so now we can say if a fruit is yellow AND round then it’s a lemon, and if it’s not round it’s a banana.
Super! We keep adding fruits to our model and after some period of time we have let’s say five layers that check for the color, roundness, size, hardness, and furriness (kiwis and coconuts are weird). We want to generalize these for all fruits and so we give our layers different weights based on what is selected. For example color might matter a lot for an orange because there aren’t many orange fruit that aren’t oranges, but red doesn’t help narrow the field all that much (cherries, apples, raspberries, strawberries, etc).
In order to figure out what these weights should be we get into the machine learning piece finally. What we do is take a bunch of pictures of fruit, let’s say n, and run it through our process which will give us some accuracy, let’s call it A. We tweak the weights and run our n pictures through again. In fact we run them through a bunch of times, let’s say m times, tweaking the weights every time to try and improve A. Ideally we can make n and m large enough that we can get to the A we want (97%?). Our images are called a training set, and this repeatable process is called supervised learning. If our weighted process is straightforward we’ve made a decision tree learning model, and if instead it’s some fancy pants non-linear process then we’ve made a neural network.
My Top for ML
At the MobilityX pitch competition, one startup stood out for its use of Machine Learning.
Cerebri – While identifying fruit may be a fun academic thought experiment, here in the US we’re in the business of making money, and that’s where Cerebri comes in. Cerebri is working to develop the neural networks necessary to process that big data your CTO told you to start collecting five years ago and turn it into useful customer-centric and actionable business intelligence. Despite their demo having unlabeled axes, please see this second relevant xkcd, these guys are bring ML to a compelling space and building a real organization around serious ML development.
It was a pleasure to attend the MobilityX pitch competition. If you want to know what is happening on the cutting edge of the mobility space, keep an eye on my picks, and MobilityX as a whole in the months to come.
Immersive Media is an umbrella term for three technologies, which despite being in the minds of Sci-Fi authors for a while are just now starting to get the enabling hardware necessary to catch up. The first is 360º Video, itself a misnomer since 360º would be 2-dimensional whereas 360º Video can be spherical and thus includes the 180º of the azimuthal angle in spherical coordinates, but 64,800º Video just doesn’t have the same ring to it. 360º Video is pre-recorded content displayed to a user who can choose any spherical angle as their viewing direction.
Virtual Reality is the next tech, and it offers the same degree of freedom for a viewer, but with a persistent virtual world. Persistent virtual worlds have been the purview of the gaming industry for years now (Tron was made in 1982!), but only recently have displays, motion tracking, and processing power gotten to the point where people will start to tolerate a couple of screens inches from their eyes. As the technology matures, we will start to see VR applications outside of gaming: teleconferencing, educational simulations, and if we’re being honest Demolition Man-style sex scenes.
The third is Augmented Reality, which is the rendering of virtual content within the real world. Of the three technologies I think AR is the most interesting, broadest, and least mature. The applications range from Terminator-style information overlain on real-world objects, to immersive AR-gaming experiences, to volumetric motion capture-enabled holography, and probably a whole bunch of things no one’s thought of yet. Just like with AI, AR has to deal with the fact that the real world is very big, and has a lot of stuff in it. Unlike AI, however, in AR the machine is enhancing human engagement rather than trying to replace it. That’s a much richer space to play in.
Even the people I’ve met who are bullish on Immersive Media are underestimating just how large the space will become. We’re definitely early, but the ability to virtualize and augment reality opens up the possibility of multiple universes limited only by the collective ingenuity of human beings. After all why build a movie theater in Duluth, when you can build a virtual theater in virtual space that anyone can get to? If for some reason you doubt that we’ll be spending the next 30-50 years recreating the physical world virtually, please keep in mind that after 15,000 years of trying to figure out how to not all be farmers, we now have multiple cross-platform farming simulation franchises. We’re fascinating creatures.
So what’s our use case in this gigantic immersive world? The one we’ve been thinking about here at moovel is making an AR display that we can hover above bus stops that will show arrival times, alerts, and other useful information to AR users. While that might not be all that compelling right now when all we have are our phones, I guarantee you it’ll be cool once we get some Black Mirror-inspired augmentolens grafted onto our eyes. This eye to the future of AR content delivery is what gives this use case such an exciting twist. Let’s dive on in.