The Age of AI, (Part 2/3)
“AI is one of the most important things humanity is working on. It is more profound than, I dunno, electricity or fire.” — Sundar Pichai, CEO of Alphabet
At the birth of digital computing in the 1940s, there were two competing visions: an industrial one (computer as a machine) championed by John von Neumann and a biological one (computers inspired by brains) championed by Mcullough, Pitts, Minsky, Papert (and later, Geoffrey Hinton). In the industrial vision the computer was a digital machine, strictly following precise commands to crunch numbers and reading/writing data to memory. In the biological version, scientists created mathematical models of biological neurons and trained networks of them to produce a desired output when presented with various inputs.
For much of the 20th century, the von Neumann approach won out: (thanks to Moore’s law) it gave us ever faster microprocessors that we use today in our PCs, tablets, and mobile phones. Now in 2023, Moore’s law is hitting physical limits and we also know that it’s not possible to build true AI systems without resorting to neural nets. Much of the recent progress in AI has been due to breakthroughs in neural networks.
Inspired by biology — but not the same
The neocortex represents three-quarters of the volume of the human brain. It’s responsible for “higher-order” brain functions: sensory perception, spatial reasoning, conscious thought, language, motor commands, planning, decision-making, memory, and learning.
It’s amazingly compact: in On Intelligence, Jeff Hawkins describes it (if spread out flat) as “about the size of a dinner napkin”. Neurons are stacked vertically in layers, with dendrites at the bottom (taking inputs) and firing electrical impulses along the axon upwards (where they can stimulate the dendrites of a neuron in the next layer). Surprisingly, the neocortex is only six layers deep. This is incredibly profound. It means that human brains can convert impulses of light fed in from the optic nerve into “tiger!” in only six “steps” (where each step is a neuron assessing its input stimuli and determining whether to fire an impulse upwards). Of course there is massive parallelism at work: billions of neurons are active at each layer at once.
In an artificial neural network, the sensitivity of dendrites to stimuli is represented by a weight value (w1, w2, … wn) and are added up to determine the output to be sent to a digital neuron’s inputs at the next level above). Artificial neural networks often have many more layers than the neocortex (70+ is not uncommon). These are called deep neural nets (which enable them to outperform humans on various image and audio recognition tasks). During the training phase, the weights are adjusted up (if the network produces the right answer) and down (if it gets it wrong). A trained neural net is simply a (small) program that carries out these computations and a (large) dataset of final weights. Training is extremely compute-intensive and requires large data sets to achieve low error rates.
Once the network has been trained, the network is presented with new data on its input layer and calculations are performed at each neuron as described previously (implemented as a series of vector dot products and matrix multiplications). This is called inference. During inference calculation proceeds from the bottom layer to the top. During training the adjustments of the weights goes in the other direction: from the top layer back to the bottom. This is called back-propagation (there are other training algorithms, but this is the most commonly used).
Why we’re still early in AI
As impressive as AI has become in the last few years, it is humbling to consider that evolution (admittedly with a 4 billion year head start) has made biological brains dramatically more efficient than our current artificial neural nets.
Our current implementations are inefficient
Leave aside for a moment that humans are still more capable than AIs in many tasks. Consider the energy efficiency of a human brain vs. GPT4. GPT4 has roughly the same number of weights as the human brain has synapses. In theory, both can store the same amount of information.
For GPT4, the details of the computing environment used to train the model have not been published, but leaked information online suggests that 10,000 central processing units (CPUs) and 25,000 graphical processing units (GPUs) were used. Using a rough approximation of 100 Watts per CPU, 400 Watts per GPU, yields an energy consumption of 11 MWatts per hour during training (this number doesn’t include the electricity required for air-conditioning and networks in the datacenter where these massive computers reside). In contrast, the human brain consumes about 20 watts (and it’s not only thinking thoughts, it’s also keeping your body functioning as you do so). So GPT4 is 5+ orders of magnitude less efficient!
AI companies are making their systems larger to make them smarter. Larger models work better but require ever more data to train on. This is known as the Chinchilla Law. GPT3 was trained on a dataset 570GB in size. Compare that to how we transfer information to human brains (via speech or by enabling them to read):
570GB is about 114 billion words. So GPT3 was trained on ~2,500 times more words (spoken) or ~400 times more words (reading) than a human being would hear or read by the time they turn 18. GPT4 most likely used a training set that was an order of magnitude larger. Even if we assume that GPT4 has the equivalent intelligence of an 18 year-old (it doesn't), combining the energy and data inefficiency, our current AI models are 240 million to 1.4 billion times less efficient than biology is. To put that in perspective, this is like AI using the equivalent energy of a nuclear power plant vs. biology with a AA battery. Worse, biology is operating with neurons which can only operate at a speed of 200 Hz, compared to modern microprocessors which can run at 5GHz, or 25 million times faster!
This clearly shows that despite several decades of research, our current models of how neurons work and how the brain learns are woefully imperfect: we’ve gotten some good results largely by immense use of brute force (data & computing). We’re at the equivalent stage of Charles Babbage building the first computer (the Analytical Engine) with 1830s Victorian-era technology.
How is AI becoming better and more efficient?
- Smaller (but almost equally effective) models: AIs take a lot of time and resources to train (training GPT4 apparently took 6 months and $100M in computing resources). Smaller models are faster to train and let researchers innovate more quickly. In February 2023, Meta released its model, LLaMa (Large Language Model Meta AI) and the open source community has since used this to build even more capable models (suitably named after other members of the Camelidae family): Alpaca and Vicuña. These AIs are several orders of magnitude smaller than GPT4.
- Additional domain-specific training: researchers at Microsoft invented an approach called Low-Rank Adaptation (LoRA) which adds specifically-trained layers of weights on an existing LLM to provide it with better domain-specific capabilities (while reducing memory and computing requirements). Similarly, Google developed an LLM specifically trained and tuned for responding to medical questions, called Med-PaLM 2.
- Faster execution: AI applications are written in a combination of various languages: low-level machine code, C/C++, and Python. Most of the high-level code is written in Python, which is easy to learn but very inefficient. New compiler technology is being developed (e.g. Mojo, Codon), that promise to speed up Python code by a factor of 10 to 10,000 by better exploiting the capabilities of the underlying computing hardware (e.g. GPUs able to execute instructions in parallel).
- Better architectures: based on understanding of how neurons and the brain really work, we can create vastly more efficient and effective AIs (see the next section).
- Neuromorphic computing: once we really understand how neurons and the brain works, we will no longer need to emulate this using a von Neumann computing architecture. Biology doesn’t do matrix multiplication. We can design new circuits directly in silicon that (more) efficiently perform the operations that biological neurons actually do. This is called neuromorphic computing. Since silicon operates at much higher speeds than biology, these circuits will outperform biology.
Will LLMs lead to human-level intelligence?
“On the highway towards Human-Level AI, Large Language Model is an off-ramp” — Yann Lecun, AI Expert
Once we truly understand the way biological neurons work and the how human brains learn, we will be able to emulate that digitally and build neural nets that can match human intelligence. This is Artificial General Intelligence (AGI). LLMs excel in producing text, but they are not built like human brains. Some AI researchers (e.g. Yann LeCun) believe that we’ll reach a “dead end” with LLMs. We may be there soon. Sam Altman of OpenAI believes we’ve reached the limits in making LLMs bigger: it’s become increasingly difficult to find ever larger datasets for training and computers large enough to train them. It’s also clear that given the inefficiencies in our current models, we need to think about ways to make current AI models smarter. There are two ways forward (and researchers are doing both):
Incremental tinkering: the “Frankenstein” approach
While excellent at producing coherent text, early LLMs were notoriously bad at math and logic problems. This is not surprising: LLMs work by using probabilities to predict “the next word” in a sequence. Math follows the rules of equations, not statistics. Solutions to logic problems require evaluating assertions and implications.
By adding plug-ins to LLMs, these AIs can “subcontract” math problems and logic problems to specialized computing engines. Code Interpreter enables GPT4 to write computer programs (using Python) to solve math and logic problems.
Other extensions include the ability to access the Internet (so that the LLM can look up new information not in its original training set), import and recognize images (so they can work with visual inputs), query travel sites (Kayak.com), or find homes (Zillow.com). More plug-ins are being built daily.
In this way, we can overcome (some) of the limitations of LLMs by bolting on additional capabilities. This creates a Frankenstein (hopefully not a monster) of capabilities that will let LLMs become ever more useful. However, LLMs remain inherently limited as a form of AI.
Disruption: start from a new foundation
The alternative is to “go back to basics” and use the latest improvements in brain scanning technologies (e.g. MRI, etc.) to learn how neurons and the brain really work. Human brains don’t use back propagation. There is no “top-level, centralized and global evaluation” of whether an answer is correct. Human brains don’t operate in “batch mode”: train, then do inference — we learn continuously. A human learns from a single experience that placing a hand on a hot stove is a bad idea. Neural nets can’t learn things in “one-shot” from a single instance. To overcome the limits of LLMs and achieve AGI, we need a new foundation (digital simulations of what neurons really do) to build upon.
How can AI exceed human intelligence?
As we get better at understanding how human brains work, the capabilities of AI will approximate (and then exceed) that of biology, because:
- We can build bigger brains (human brains need to fit through the birth canal).
- We can build more efficient brains (because artificial neurons can be smaller as they don’t have to contain a nucleus, cytoplasm, mitochondria, and all the other baggage that living cells have to carry along).
- We can build faster brains (silicon operates much faster than biology).
- We can build smarter brains by adding more layers of neurons to enable more pattern matching and conceptual thinking.
- We can have more reliable and larger memories to store information.
- We can interconnect brains and link them to solve more complex problems.
- We can improve on evolution by building neural network models that are more effective than what biology can do (because they’re freed from the physical constraints that biology imposes). For example, despite the fact that biology doesn’t use back-propagation, it appears to be much more efficient at encoding information: GPT4 stores the equivalent knowledge of 1,000 human brains.
- We can make brains that live forever