I’m in the middle of taking the Coursera Machine Learning class -- which has been amazingly good -- and it recently covered how one could implement a machine-learning algorithm to power driverless cars.
Here’s how you might do it. You put a camera on the front of your car, and you set it to capture frequent images of the road ahead of you while you drive. At the same time, you capture all the data about your driving -- steering-wheel movement, acceleration and braking, speed -- plus stuff like weather conditions and number/distance of cars near you.
Putting that together, you’ve got some nice training data: a mapping between “situation the car is in” and “how the human driver responded.” For example, one thing the system might learn is: “when the car’s camera sees the road curve in this particular direction, then turn the steering wheel 15 degrees to the left and decelerate to 35 mph.”
Of course, the world is a messy, complicated place, so if you want your self-driving car to be able to handle any road you throw at it, you need to train it with a lot of data. You need to drive it through as many potential situations as possible: gravel roads, narrow alleys, mountain switchbacks, traffic-heavy city expressways. Many, many times.
Which brings us to Google Street View.
For years now, Google has been sending Street View cars around the world, collecting rich data about streets and the things alongside them. At first, this resulted in Street View imagery. Then it was the underlying street geodata (i.e., the precise longitude/latitude paths of streets), enabling Google to ditch Tele Atlas and make its maps out of data it obtained from Street View vehicles.
Now, I’m realizing the biggest Street View data coup of all: those vehicles are gathering the ultimate training set for driverless cars.
I’m sure this is obvious to people who have followed it more closely, but the realization has really blown my mind. With the goal of photographing and mapping every street in the world, Street View cars must encounter every possible road situation, sort of by definition. The more situations the driverless car knows about, the better the training data, the better the machine-learning algorithms can perform, the more likely it is that the driverless car will work. Brilliant.
When I originally heard about Google’s driverless car experiment, I assumed one big reason it was being developed was to make Street View data collection more efficient. No need to pay humans to drive those cars around the world if you can automate it, right? But it’s likely the other way around -- Street View cars inform the driverless cars.
The next question is, as Street View data improves the driverless cars, will the driverless cars get good (and legal) enough to eventually gather Street View data without humans, which will then lead to more driving experience, which will lead to smarter driverless cars, which will lead to more efficient Street View data gathering, in a vicious cycle of driving and learning?
I’m curious about the direction of Google’s strategic thinking. Did it start with “Let’s take photos of every street in the world” and lead to “Hey, we might as well collect data for self-driving cars while we’re making all this effort”? Or was getting data for driverless cars a goal from the get-go, with Street View imagery being a convenient diversion from the real plan (and a way to justify the effort to shareholders skeptical of such expensive R&D)? If the latter, it’s especially brilliant.
More broadly, this inspires me to think more “meta” about data collection. If you have an opportunity to collect data, there’s the value of the data itself, but there’s also the value of the data about the collection of the data. What data does a journalist tangentially (and maybe even unknowingly) collect as she goes about her reporting business? It might be more valuable than the stuff she set out to collect in the first place.
UPDATE: I changed the title of this post from “wolf in sheep’s clothing,” as that was a lame metaphor that didn’t actually make sense.