AI now has business and innovation momentum not seen since at least the internet bubble and, given the greater diffusion speed of technology now, perhaps ever. The question is, will it last and lead to a Kurtweilian singularity or will it soon die out?
Future: Positives
1) AI is sucking in talent.
With the ever-present call for more stem majors, it may be somewhat surprising to hear that the US has, since the 1970’s, annually produced far more Ph. D.’s in physics, math, and engineering than the number of jobs available in their respective fields. Over the years industry (especially, at times, aerospace, oil-discovery, and wall st.) have taken many such graduates and the U.S. has benefited.
This oversupply of talent has not gone unobserved by graduate students and post-docs who know their odds are low. Consequently, especially physicists, do a considerable amount of research in fields ostensibly far outside their official areas of expertise. This is possible because the mathematics is similar across many fields.
Such cross fertilization is similar to what has occurred between math and physics over the last century. In successive waves, classical mechanics, relativity, quantum mechanics, quantum field theory, super-symmetry, and string theory have led to collaboration between the two fields that have benefited both.
AI will undoubtedly be the beneficiary of this trend. Not only are the salaries substantial, but (in contrast to, say, wall st), AI is actually a field that such people could actually enjoy working in. So, it seems likely that future breakthroughs are coming. And this wave is so big that it attracts not just physicists but workers in statistics, computer science, biology --- practically any STEM field.
2) Researchers are starting to understand the magic
As mentioned in my article on Transformers, there is a lot of magic and lore involved with neural network training and success. By constructing “toy models”, i.e., simpler neural networks that involve only a single aspect of the neural network at a time (e.g., encoding, attention) researchers are finally starting to understand what is happening.
Neural networks involve many design choices that are currently made by trial and error. How many layers in the encoder? In the attention head? In the decoder? How many attention heads? How fast to perform learning in training the network? What measure of similarity to use in the attention components? How many words processed together?
Without a clue, all these choices multiply and you could be faced with the need to test millions of different choices to approximate the best choice. It would be fine if the networks were not so sensitive to these choices, but that is not the case. Thus, understanding what is going on can not only greatly reduce the training time but also increase confidence in the result.
Future: Negatives
1) No free lunch.
No free lunch refers to the no free lunch theorem from the mid 1990’s. Essentially, given any algorithm used to learn a function, provided there are no restrictions on the function, one can find data generated by the true function so that the function learned (using the algorithm) on that data will actually be wrong on new data at least half the time.
In order to ‘beat’ this you need to restrict the algorithm to be biased to favor certain classes of functions and also have the actual data-generating function be in this class. Indeed, if we look at some of the major successes of AI, vision and language processing, we see two areas which humans have already solved. Animals too, in the case of vision and, to a lesser degree, in the case of language.
Google’s Alpha Fold and more recent Alpha Proteo (predicting protein folding and protein binding, respectively) are exceptions to this rule. Even here, nature has solved many protein design problems through evolution, so it must be that these problems are solvable.
But, even if solvable, it may happen that a phenomenon you wish to understand is not matched to the AI structure you are using. Since the biases in AI models are highly implicit it is not always clear how to change them. It may be that life sometimes evolves systems/processes that fall into this category.
Worse, the problem you want to solve may be “computationally irreducible”, an area discussed heavily in the works of Stephen Wolfram. A cartoon version of this is as follows. Consider computing a function such as cosine. There are a large number of (relatively) simple expressions that allow you to compute Cos(x) for any x involving simple arithmetic (i.e, addition and multiplication) in reasonable time. In fact, for pretty much any function you can name, a relatively simple algorithm exists --- if not, it would probably not have a name! But for the vast majority of functions, this is not the case. There may be no simple algorithm to define them (in which case, they are probably unknown to us) or running the algorithm may take too long.
A simpler example of “unknown-ness” is that the set of algebraic number (e.g., 2, ½, Sqrt(2)…) is vastly smaller than the “non-algebraic” numbers (called irrationals). But, off the top of my head, I can think of only two irrational numbers (Pi and e, Eulers constant). While I can write down algebraic numbers all day, almost all irrational numbers are unknown.
In short, we have an extreme bias for simple things that we can understand. But most things are not simple and not understandable. It may be that some problems we wish to solve, e.g., life-extension, fall into this category and while AI may happily provide advice, it may be garbage. Of course, this is not a new problem.
2) Data Issues
The data problem may become quite important, depending on how things go. The current crop of LLM’s (Large Language Models)[1] were trained by scraping the web for hundreds of billions of text examples. This happened before most people knew it was being done and was possible only because the web pages were public.
But now people are more aware that data has value and much of the specialized data needed to go beyond captioning cat photos resides on private networks. For example, Baidu has just blocked Google and Micrsoft from scraping it’s websites for AI training. And with governments everywhere chomping at the bit to regulate AI, will new AI’s be able to get the data they need?
Researchers have recently raised an additional concern about data. As AI starts to dominate web-content creation, the data that future AI’s are trained on will increasingly be generated by prior AI’s. In simple models with strict assumptions, this can lead to some bizarre results (e.g., first the AI can only generate a single cat picture and then that picture no longer resembles a cat). However, I am skeptical of this conclusion provided humans are still involved in the selction process. At the simplest level, if humans reject posting AI-generated cat pictures that no longer resemble what they want, the problem will not occur. But what about more sophisticated content, e.g., AI writen blog posts (BTW, as may be obvious, I am old and so far these posts are written by me, not AI). If people post AI content w/o proofreading it, we could have trouble. But if that happens, we already have trouble.
3) Magic
As discussed earlier, to avoid overfitting, deep networks must be implementing some form(s) of regularization. But what it is is not known precisely. Consequently, there are a large number of ad-hoc regularization-like procedures adopted by AI trainers that ‘seem to work’. But these methods require fine-tuning, essentially redoing the training with different parameters multiple times. Needless to say, this can take a large amount of time.
Finally, while the “positives” section of this article described the use of toy models to understand various aspects of this magic, it is not a given that it will succeed. As economists have proven time and time again, putting together the results of grossly simplified toy models may not lead to understanding the whole.
[1]Meaning large models of language, not models of large languages.
Read Comments