“Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child?”
One of the consequences of that, which is not so obvious, is thinking about children not just as immature forms who learn and grow into an adult intelligence, but as a separate kind of intelligence
Life history is the developmental trajectory of a species: how long a childhood it has, how long it lives, how much parental investment there is, how many young it produces. … The strategy of producing just a few younger organisms, giving them a long period where they’re incapable of taking care of themselves, and then having a lot of resources dedicated to keeping them alive turns out to be a strategy that over and over again is associated with higher levels of intelligence. … It turns out to even be true for plants and for immune systems.
Creatures that have more complex immune systems also have this longer developmental trajectory. It looks as if there’s a general relationship between the very fact of childhood and the fact of intelligence. That might be informative if one of the things that we’re trying to do is create artificial intelligences or understand artificial intelligences. In neuroscience, you see this pattern of development where you start out with this very plastic system with lots of local connection, and then you have a tipping point where that turns into a system that has fewer connections but much stronger, more long-distance connections. It isn’t just a continuous process of development.
An interesting consequence of this picture of what intelligence is like is that many things that seem to be bugs in childhood turn out to be features. Literally and metaphorically, one of the things about children is that they’re noisy. They produce a lot of random variability. … That randomness, variability, and noise—things that we often think of as bugs—could be features from the perspective of this exploratory space. Things like executive function or frontal control, which we typically think of as being a feature of adult intelligence—our ability to do things like inhibit, do long-term planning, keep our impulses down, have attentional focus—are features from the exploit perspective, but they could be bugs from the perspective of just trying to get as much information as you possibly can about the world around you.
Being impulsive and acting on the world a lot are good ways of getting more data. They’re not very good ways of planning effectively on the world around you. This gives you a different picture about the kinds of things you should be looking for in intelligence.
I’m going to try and summarise both their positions in a few sentences, but you should definitely read both essays, especially as they are so short.
Rich Sutton (approx.): learning and search always outperform hand-crafted solutions given enough compute.
Rodney Brooks (approx.): No, human ingenuity is actually responsible for progress in AI. We can’t just solve problems by throwing more compute at them.
I think both positions are interesting, important and well supported by evidence. But if you read both essays, you’ll see that these positions are also not mutually exclusive, in fact they can be synthesised. But to accept this interpretation you need to take your view one level ‘up’, so to speak.
Rich Sutton isn’t arguing for wasteful learning and search, he’s calling on us to improve it. He is saying we’ll never be able to go back to hand-written StarCraft bots.
The meta lesson is that the most important thing to improve with search and learning — is learning itself.
[Thinking about the invention of relational databases] is a good grounding way to think about machine learning today – it’s a step change in what we can do with computers, and that will be part of many different products for many different companies. Eventually, pretty much everything will have ML somewhere inside and no-one will care.
An important parallel here is that though relational databases had economy of scale effects, there were limited network or ‘winner takes all’ effects.
with each wave of automation, we imagine we’re creating something anthropomorphic or something with general intelligence. In the 1920s and 30s we imagined steel men walking around factories holding hammers, and in the 1950s we imagined humanoid robots walking around the kitchen doing the housework. We didn’t get robot servants – we got washing machines.
Washing machines are robots, but they’re not ‘intelligent’. They don’t know what water or clothes are. Moreover, they’re not general purpose even in the narrow domain of washing … Equally, machine learning lets us solve classes of problem that computers could not usefully address before, but each of those problems will require a different implementation, and different data, a different route to market, and often a different company. Each of them is a piece of automation. Each of them is a washing machine.
one of my colleagues suggested that machine learning will be able to do anything you could train a dog to do, which is also a useful way to think about AI bias (What exactly has the dog learnt? What was in the training data? Are you sure? How do you ask?), but also limited because dogs do have general intelligence and common sense, unlike any neural network we know how to build. Andrew Ng has suggested that ML will be able to do anything you could do in less than one second. Talking about ML does tend to be a hunt for metaphors, but I prefer the metaphor that this gives you infinite interns, or, perhaps, infinite ten year olds.
In a sense, this is what automation always does; Excel didn’t give us artificial accountants, Photoshop and Indesign didn’t give us artificial graphic designers and indeed steam engines didn’t give us artificial horses. (In an earlier wave of ‘AI’, chess computers didn’t give us a grumpy middle-aged Russian in a box.) Rather, we automated one discrete task, at massive scale.
OK, so we can now train AlexNet in minutes rather than days, but can we train a 1000x bigger AlexNet in days and get qualitatively better results? Apparently not…
So in fact, this graph which was meant to show how well deep learning scales, indicates the exact opposite. We can’t just scale up AlexNet and get respectively better results – we have to fiddle with specific architectures, and effectively additional compute does not buy much without order of magnitude more data samples, which are in practice only available in simulated game environments.