Numerous reports have emerged of multiple robotaxis from GM’s Cruise unit stalling on San Francisco streets, sometimes blocking intersections and on a few occasions boxing cars in. A report from an alleged anonymous Cruise employee and other reports suggest the problem involves Cruise cloud servers failing or not communicating with the vehicles making them stop in a “safe state.” Obviously this should not happen and Cruise should do better, on the other hand, hiccups are to be expected in any prototype project and must be tolerated to some degree. On the other hand, Cruise recently tried to move to graduating from prototype status by charging customers for rides and expanding operations.
Why is it happening?
Cruise has made few public statements on these events, other than indicating that in the event of problems, their vehicles are programmed to go to a safe state, turning on flashers if needed. While this is safe, it’s bad road citizenship to do it a lot, or to have multiple cars do it at the same location.
The alleged cause is failure in cloud servers. While all robocars need to talk to the cloud to some small degree for updates on maps and traffic, and to be given orders like new destinations, Cruise may have avoided a design principle that other teams have followed to minimize communications and to also make the cars highly function in the event communications are out. Central failures will happen but should not be allowed to cause a problem like this, something Cruise will need to fix in their design. While a long outage of the central servers may cause cars to go out of service, they should do so gracefully, finding a parking spot to wait. Cruise vehicles today are in prototype stage and don’t really support functions like pulling over or parking well, for which they are have been criticized.
While there are those who tout the idea of the “connected car” as the future, the reality is a car should be able to complete almost any short to medium mission without ever talking to anybody, and be able to give itself a default mission (head to storage) in a longer outage. Normally, parking will involve talking to the cloud, to pick a parking destination, and to negotiate with the parking location and pay for the parking, but a fallback plan should also exist to get cars off the road.
Reports suggest Cruise has actually had to dispatch humans to recover the cars, possibly by driving them away. This suggests a poor failure response, in that even if some central server can’t be reached, cars should have alternate channels by which to receive remote commands from HQ to stop them from blocking traffic. This should include alternate communications channels (such as multiple cellular radios and cellular company accounts.)
Is it a big deal?
While Cruise has done some things wrong here — bad design that’s not sufficiently robust, not being open with the public, and of course making the failing servers, the truth is with something as complex and revolutionary as robotaxis, nobody is going to do it perfectly, and only the super competent and super-funded will do it very well. If we’re talking about people getting hurt, that’s a different story, but it hiccups cause a little bit of traffic disruption, that’s hardly something to get more than modestly frustrated at. The huge future benefits of robotaxis are among the greatest that technology has to offer, and that’s quite a dramatic statement, but not an overstatement.
Perhaps a bigger deal is the recent injury accident involving a Cruise car. Like almost all accident reports we see on self-driving cars, the driver of another car was entirely or mostly at fault, and in this case, certainly for the injuries. In fact, based on the description, the Cruise vehicle acted correctly and with caution. The only thing it could have done better would to to proactively avoid the Prius that hit it, rather than passively stop and be hit. Earlier, Waymo published a study where they examined every fatal accident in their service territory, and simulated it to see how it would have gone if they had been the at-fault car but also the car that was hit. They found they prevented all the accidents if they were driving in the at-fault car, but also a large number of the ones where they weren’t. The “extra credit” bar is to prevent accidents even when you do nothing wrong.
Even so, Cruise does seem to have had its share of problems. Some stem from their deliberate decision to try a harder environment like San Francisco first. Others, like this fleet shutdown problem, could happen anywhere but cause more trouble in a city.
Cruise needs to put even more focus on designing “fail operational” systems which still do something reasonable even when their components fail. At the same time, we should not demand they be perfect at that on day one, but rather that they work towards it when they go into full production. Cruise has the natural fear most companies have of being open about what is going on because they know the public won’t understand, and won’t want to give them the slack they should. It’s not clear how to change that.