I came across a rather interesting blog[1] on cognitive semiotics by Georgetown University that talks about the philosophical difference between an artificial general intelligence and human agency and in a nutshell, the most important point it makes is, and I quote,
“Contrary to dangerously simplistic popular notions, the semiotic method tells us that human-like AGI will never be able to internalize meaning the same way humans do, and as a result of which, will not have the capacity to exhibit enough independent agency…”
The reason why I found this blog so interesting is that it makes a reasoned argument that a machine can never have ‘enough independent agency’. This lies at the heart of driverless cars in the truest sense of the word – robots that can drive around anywhere anytime and contend with all sorts of scenarios we humans deal with as drivers on any given day. Such Level-5 robotic drivers are not possible at all because these theoretical machines would have to be able to exercise agency beyond their fixed rules engine to deal with emergent scenarios and make judgements about future actions. No rules based system can define all possible scenarios that a driver can possibly have to deal with in his or her driving life. I am sure this realisation was there when the increasing levels of automation were being crafted into the now famous SAE Levels. The answer for some was the magic bullet of neural networks because neural networks are not based on explicit rules. The machine learns implicit rules by enough examples, but…. neural networks are optimisation routines to achieve a goal by showing it enough examples.
One could ask, why can’t we teach a machine to drive by enough examples. The problem with trying to teach a machine to learn how to drive (in a generic sense) by example is an impossible task using whatever flavour of networks and methodology appeals to you. What you are trying to abstract is an infinitely complex and large blackbox. Yes, you can whip-up a demo or two to show the idea in action for one or two specific drives set as a learning goal (defined by start and end points, speed and limited manoeuvres), but to build a machine that can competently drive from only camera/laser direct to control command, in a generic manner, is an absurd proposition. Why, because it implies that all possible road actor behaviours, traffic patterns, road layouts, speed profiles and driving manoeuvres can be taught by example in all possible permutations – and I emphasize the words “all possible”.
The limits of neural networks are becoming apparent and there is an increasing realization that the industry is hitting the wall. Jerome Pesenti who heads up AI at Facebook, in a 2019 Wired article admitted hitting the wall with AI (https://www.wired.com/story/facebooks-ai-says-field-hit-wall/). Jerome said that we should not expect to keep making progress in AI just by making bigger deep learning systems with more computing power and data. “At some point we're going to hit the wall,” he said. “In many ways we already have.”
If we have to build a driving automation technology, it has to be bounded within an operating domain and therein lies the conundrum of conditional automation. Under what conditions is the automated driving function safely operable and at what point the human being must take over? Is it up to the system to constantly monitor that it is not violating its operating domain, or the human driver? The former presents a tremendous challenge (the system knowing in advance when it will fail) and how will it facilitate the handover period; which will certainly have to be in a safe manner and considering that humans react much more slowly than machines, this will add further complexity. I would argue, most automated driving with ever increasing levels of automation is likely to remain cooperative hybrid technology that interchanges command and control between software and human driver unless the operating domain of the robot is tightly fixed, controlled and limited without any possible externality affecting the domain boundaries. I can imagine a robot plying along a fixed route in a warehouse ad infinitum with other users of the warehouse following explicit rules to not interfere so as to cause any change in the operational domain of the robot. The moment you start extending the domain where other actors can behave unpredictably, the paradigm fails.
This segues to the next big problem in automated driving – behaviour prediction. In dynamics, prediction of a system’s behaviour is predicated on being able to explicitly model the system, as a starting point. If the system cannot be explicitly modelled, one can only make probabilistic guesses. And again, we are back to AI as the tool of choice to develop behaviour prediction. The limitations of AI apply as is. How do we get enough data about behaviours of everything that is expected to be dynamic in a driving scene to have the confidence that the prediction about its behaviour will actually be useful or usable? The answer is never. So in dealing with behaviour prediction, we start with simple explicit variables, for example, if a thing is moving in a scene at a certain speed, what’s a good distance to maintain, or if a thing starts nudging into your path, it is likely going to be in your path and such like. There are good examples out there from Nvidia and Mobileye on such explicit rules based simple frameworks for safe automated driving.
If automation is likely to be conditional for the foreseeable future, it is imperative that the system is able to perform safely within its circumscribed operational domain. For an automated system to be useful and safe at the same time, the minimum and necessary condition is its ability to correctly perceive the driving scene. The largest gap in performance of automated driving systems is still in perception that fails suddenly and frequently. Such brittle perception performance makes conditional automation systems unsafe by design. How can the system user trust the system to perform while it is in its operating domain when there is no way to validate its scene understanding performance? We can go in circles, as nobody seems to have a clear answer. There are good starting points like ISO/PAS 21448 that attempts to define the safety of the ‘intended functionality’ to answer this conundrum. It tries to define the ‘unknown – unknowns’ related to perception and encourages minimising them. No matter how you slice it, unless perception performance can be validated within an operational domain beyond doubt, there is no really safe way out of this. The answer lies not in wrapping up this problem in the ‘acceptable failure risk’ package. There have been mathematical attempts at defining what is an acceptable failure risk such as 1 safety critical error in 10,000 hours of driving. If any commercially deployed system today could come close to even 1/10 of that goal, it would be awesome. The real issue is that a failure in a mission critical capability where lives are at risk is not acceptable. We have to stop hedging and develop explicitly engineered core perception systems (read geometrical rather than appearance based) that can be validated for their perception performance within an operational design domain. The current SAE taxonomy of automation levels, though being useful, is not really usable in a practical sense. I recently wrote about an alternative and more practical classification regime for automated driving systems. You can read about it here https://selfdriving.substack.com/p/top-tier-hierarchy-for-driving-automation
We knew these issues when we started out in 2016 and took on the perception challenge coming in from the steep slope of geometric and explicitly engineered core perception systems rather than going down the neural network route (more data and more compute hopefully will get us there someday). Borrowing from Gordon Sullivan’s famous book title, hope is not a method – the sooner the industry recognises this the better. Increasingly more and more practitioners engaged in the development of automated driving systems recognise the limits of neural networks and my wish is that they heed this call to approach conditional automation from the right side, not the easy side.
[1]https://blogs.commons.georgetown.edu/cctp-711-fall2017/2017/12/16/you-cant-just-make-up-a-word-the-symbolic-cognitive-approach-to-philosophizing-artificial-general-intelligence/