Nov 14, 2024

The Bitter Religion: AI’s Holy War Over Scaling Laws

The AI community is locked in a doctrinal battle about its future and whether sufficient scale will create God.

Artwork by

Illustration by Midjourney

00:00

You can listen to an audio version of The Generalist on Spotify or Apple Podcasts.

ACTIONABLE INSIGHTS

AI’s Holy War

I would rather live my life as if there is a God and die to find out there isn't, than live as if there isn't and to die to find out that there is.

– Blaise Pascal

Religion is a funny thing. It is entirely unprovable in either direction and perhaps the canonical example of a favorite phrase of mine: “You can’t bring facts to a feelings fight.”

The thing about religious beliefs is that on the way up, they accelerate at such an incredible rate that it becomes nearly impossible to doubt God. How can you doubt a divine entity when the rest of your people increasingly believe in it? What place is there for heresy when the world reorders itself around a doctrine? When temples and cathedrals, laws and norms, arrange themselves to fit a new, implacable gospel?

When the Abrahamic religions first emerged and spread across continents, or when Buddhism expanded from India throughout Asia, the sheer momentum of belief created a self-reinforcing cycle. As more people converted and built elaborate systems of theology and ritual around these beliefs, questioning the fundamental premises became progressively difficult. It is not easy to be a heretic in an ocean of credulousness. The manifestations of grand basilicae, intricate religious texts, and thriving monasteries all served as physical proof of the divine.

But the history of religion also shows us how quickly such structures can crumble. The collapse of the Old Norse creed as Christianity spread through Scandinavia happened over just a few generations. The Ancient Egyptian religious system lasted millennia, then vanished as newer, lasting beliefs took hold and grander power structures emerged. Even within religions, we’ve seen dramatic fractures – the Protestant Reformation splintered Western Christianity, while the Great Schism divided the Eastern and Western churches. These splits often began with seemingly minor disagreements about doctrine, cascading into completely separate belief systems.

The holy text

God is a metaphor for that which transcends all levels of intellectual thought. It’s as simple as that.

– Joseph Campbell

Simplistically, to believe in God is religion. Perhaps to create God is no different.

Since its inception, optimistic AI researchers have imagined their work as an act of theogenesis – the creation of a God. The last few years, defined by the explosive progression of large language models (LLMs), have only bolstered the belief among adherents that we are on a holy path.

It has also vindicated a blog post written in 2019. Though unknown to those outside of AI until recent years, Canadian computer scientist Richard Sutton’s “The Bitter Lesson” has become an increasingly important text in the community, evolving from hidden gnosis to the basis of a new, encompassing religion.

In 1,113 words (every religion needs sacred numbers), Sutton outlines a technical observation: “The biggest lesson that can be read from 70 years of AI research is that general methods that leverage computation are ultimately the most effective, and by a large margin.” AI models improve because computation becomes exponentially more available, surfing the great wave of Moore’s Law. Meanwhile, Sutton remarks that much of AI research focuses on optimizing performance through specialized techniques – adding human knowledge or narrow tooling. Though these optimizations may help in the short term, they are ultimately a waste of time and resources in Sutton’s view, akin to fiddling with the fins on your surfboard or trying out a new wax as a terrific surge gathers.

This is the basis of what we might call “The Bitter Religion.” It has one and only one commandment, usually referred to in the community as the “scaling laws”: exponentially growing computation drives performance; the rest is folly.

The Bitter Religion has spread from LLMs to world models and is now proliferating through the unconverted bethels of biology, chemistry, and embodied intelligence (robotics and AVs). (I covered this progression in-depth in this post.)

However as Sutton’s doctrine has spread, the definitions have begun to morph. This is the sign of all living and lively religions – the quibbling, the stretching, the exegesis. “Scaling laws” no longer means just scaling computation (the Ark is not just a boat) but refers to various approaches designed to improve the performance of transformers and compute, with a few tricks along for the ride.

The canon now encapsulates attempts to optimize every part of the AI stack, ranging from tricks applied to the core models themselves (merged models, mixture of experts (MoE), and knowledge distillation) all the way to generating synthetic data to feed these ever-hungry Gods, with a lot of experimentation in-between.

The warring sects

The question roiling through the AI community recently with the tenor of a holy war is whether The Bitter Religion is still true.

A new paper out of Harvard, Stanford, and MIT titled “Scaling Laws for Precision,” stoked the conflict this week. The paper discussed the end of efficiency gains via quantization, a range of techniques that have improved the performance of AI models and have been of great use to the open-source ecosystem. Tim Dettmers, a research scientist at the Allen Institution for Artificial Intelligence, outlined its significance in the thread below, calling it “the most important paper in a long time.” It represents the continuation of a conversation that has bubbled for the past few weeks and reveals a notable trendline: the growing solidification of two religions.

OpenAI CEO Sam Altman and Anthropic CEO Dario Amodei stand in one sect. Both have stated with a level of confidence, marketing, and perhaps trolling that we will have Artificial General Intelligence (AGI) in approximately the next 2-3 years. Both Altman and Amodei are arguably most reliant on the divinity of The Bitter Religion. All of their incentives are to overpromise and create maximum hype to accumulate capital in a game that is quite literally dominated by economies of scale. If scaling is not the Alpha and the Omega, the First and the Last, the Beginning and the End, then what do you need 22 billion dollars for?

Former OpenAI Chief Scientist Ilya Sutskever adheres to a different set of tenets. He is joined by other researchers (including many from within OpenAI, per recent leaks) who believe that scaling is approaching a ceiling. This group believes that novel science and research will be required to maintain progress and bring AGI to the real world.

The Sutskeverians reasonably reference the financial infeasibility of the Altman sect’s belief in continuous scale. As AI researcher Noam Brown asked, “After all, are we really going to train models that cost hundreds of billions of dollars or trillions of dollars?” That does not account for the additional billions of spending required on inference computing if we push scaling compute from training to inference.

But a true devotee is well acquainted with their opponents’ arguments. The missionary at your door can wrangle their way through your Epicurean trilemma. To Brown and Sutskever, the Suttonites point to the possibilities of scaling “test-time compute.” Rather than relying on greater compute to improving training, as has been the case thus far, “test-time compute” dedicates greater resources to execution. When it comes time for an AI model to come up with an answer to your question or generate a piece of code or text, it allows for greater time and compute. It’s the equivalent of shifting your focus from studying for a math exam to convincing your teacher to give you an extra hour and allow you to bring a calculator. For many in the ecosystem, this is the new frontier for The Bitter Religion, as teams retreat from the orthodoxy of pre-training and move towards post-training/inference.

It is all very well to point out the holes in other credos, to duff up other dogmas without revealing yourself. What are my own beliefs? First, that the current crop of models will produce a very high return on investment over time. As people learn how to engineer around the constraints and exploit the available APIs, we will see truly novel product experiences emerge and succeed. We will move past the skeuomorphic and incremental stage of AI products. We shouldn’t think of this as “AGI,” per se (which suffers from flawed framing), but a “minimum viable intelligence” capable of being tailored to different products and use cases.

As for achieving Artificial Superintelligence (ASI), more structure is needed. Clearer definitions and separation would help us more productively discuss the trade-offs in economic value versus economic cost each would entrain. AGI, for example, may provide economic value to a subset of users (a mere local religion), whereas ASI could show unstoppable compound effects and change the world, our belief systems, and our social structures. I don’t foresee us reaching ASI by scaling transformers alone; but alas, that is just my atheistic belief, as some might say.

Losing religion

The AI community cannot resolve this holy war any time soon; there are no facts to be brought to this feelings fight. Instead, we should turn our attention to what it would mean for AI to question its devotion to scaling laws. A loss of faith could have cascading effects beyond LLMs, impacting all industries and markets.

It should be said that we haven’t yet exhausted scaling laws in most areas of AI/ML; there are more miracles to come. However, if doubt does creep in, it will become much harder for investors and builders alike to have similarly high conviction in the terminal state of performance for “earlier in the curve” categories like biotech and robotics. Put another way, if we see LLMs begin to slow down and stray from the anointed path, the belief systems of many founders and investors will collapse in adjacent areas.

Whether or not this is fair is a different question.

One could argue that “general intelligence” naturally requires more scale, and thus, the “quality” of specialized models should show at a smaller size, making them less susceptible to hitting a ceiling before providing material value. If a bio-specific model ingests a fraction of the data and thus requires a fraction of the compute to reach viability, shouldn’t it have plenty of headroom left to improve? This makes intuitive sense, but we’ve repeatedly learned that the magic often lies elsewhere: including adjacent or seemingly irrelevant data frequently boosts the performance of seemingly unconnected models. Including coding data, for example, seems to improve broader reasoning.

In the long run, debates around specialized models may be moot. The ultimate goal for anyone building ASI is likely a self-replicating, self-improving entity capable of unbounded ingenuity across every field. Holden Karnofsky, a former OpenAI Board Member and the founder of Open Philanthropy, dubbed this creation “PASTA” (Process for Automating Scientific and Technological Advancement). Sam Altman’s original monetization plan seemed to rely on a similar principle: “build AGI then ask it how to make a return.” This is eschatological AI, a final destiny.

The success of large-scale AI labs like OpenAI and Anthropic has led to a hunger of capital markets to back similar “OpenAI for X” labs with long-duration goals built around building “AGI” for their given vertical or industry. This extrapolation of scaling breaking down would lead to a paradigm change, away from OpenAI simulacra and toward product-centric companies – a possibility I raised at Compound’s 2023 Annual Meeting.

Unlike the eschatological models, these companies would have to show a sequence of progress. They would be companies built on engineering problems of scale, rather than scientific organizations conducting applied research, with the end goal of building products.

In science, if you know what you are doing, you should not be doing it. In engineering, if you do not know what you are doing, you should not be doing it.

– Richard Hamming

The believers are unlikely to lose their divine certitude anytime soon. As mentioned earlier, as religions proliferate, they codify a playbook and a set of heuristics for living life and worshipping. They construct physical monuments and infrastructure, reinforcing their power and wisdom and showing that they “know what they are doing.”

In a recent interview, Sam Altman said this about AGI (emphasis ours):

This is the first time ever where I felt like we actually know what to do. From here to building an AGI will still take a huge amount of work. There are some known unknowns but I think we basically know what to do and it’ll take a while; it'll be hard but that’s tremendously exciting.

It’s hard to have disbelief with statements like that. Amen.

The reckoning

In questioning The Bitter Religion, scaling skeptics are reckoning with one of the most profound discussions of the past few years. We have all had it, in one form or another. What happens if we invent God, and how quickly might it arrive? What happens if AGI truly, irreversibly takes off?

As with all unknown, complex topics, we quickly cached our particular response in our brain: a subset despaired over their impending irrelevance, a majority expected a mix of destruction and prosperity, and a final chunk anticipated pure abundance as humans do what we do best; continue to look for problems to solve and solve the problems we create for ourselves.

Anyone with a lot at stake has hopefully anticipated what the world looks like for them should the scaling laws hold and AGI arrives within a couple of years. How will you serve this new God, and how will this new God serve you?

But what if the gospel of stagnation chases out the Panglossian optimists? What if we begin to think that, perhaps, even God decays? In a previous piece, “Robotics FOMO, Scaling Laws, & Technology Forecasting” I wrote:

I sometimes wonder what will happen if scaling laws don’t hold and if that will look similar to what revenue churn, slowed growth, and higher interest rates did to many parts of tech. I also sometimes wonder if scaling laws perfectly hold and if that will look similar to what commoditization curves in many other areas have looked like for first movers and their value capture.

The nice part about capitalism is either way we’re going to light a ton of money on fire finding out.

For founders and investors, the question becomes: what comes next? The candidates to become the great product builders in each vertical are becoming known. There will be more, across sectors, but this story has already started to play out. Where does the new one begin?

If scaling stagnates, I expect to see a wave of shutdowns and consolidations. The remaining firms will increasingly shift their focus toward engineering, an evolution we should foresee by following talent flows. Already, we’re seeing indications that OpenAI is moving in this direction as it increasingly productizes itself. This shift will open up space for the next generation of startups to “leapfrog” incumbents in a path-creating attempt that leans heavily on novel applied research and science, not engineering.

Lessons of religion

A viewpoint that I have on technology is that anything that looks obviously compounding, usually is not for a long time, while a viewpoint that everyone has is that any business that looks obviously compounding, strangely does so at a far underestimated rate and scale.

Early signs of religious fracturing often follow predictable patterns that one could use as a framework to continue to trace the evolution of The Bitter Religion.

It often begins with the emergence of competing interpretations, both for capitalistic and ideological reasons. In early Christianity, divergent views on Christ’s divinity and the nature of the Trinity forced a split, resulting in radically different scriptural interpretations. In addition to the AI schisms we’ve already cited, there are other emerging fractures. For example, we see a faction of AI researchers rejecting the core orthodoxy of transformers and instead embracing alternative architectures like State Space Models, Mamba, RWKV, Liquid Models, and more. Admittedly, these are soft signals for now, but they show a hint of bubbling heterodoxy and a willingness to rethink the field from its base tenets.

As time passes, prophets’ impatient pronouncements can also lead to disbelief. When religious leaders’ predictions don’t materialize or when divine intervention doesn’t arrive as promised, it can plant the first seeds of doubt.

The Millerite movement, which predicted Christ’s return in 1844, collapsed when Jesus didn’t arrive on their schedule. In tech, we generally bury the dead quietly and allow our prophets to continue to paint optimistic, long-duration versions of the future despite repeated missed deadlines (hi, Elon). Still, the belief in scaling laws could see a similar collapse if it is not bolstered by continuously improving raw model performance, regardless of cost.

A corrupt, bloated, or unstable religion is susceptible to apostates. The Protestant Reformation gained momentum not just because of Luther’s theological arguments but because it emerged during a period of decadence and volatility in the Catholic Church. When the dominant institution showed cracks, “heretical” approaches that had existed for years suddenly found fertile ground.

In AI, we might watch for smaller models or alternative approaches achieving comparable results with far less compute or data, such as the work coming from various Chinese corporate labs and open-source groups like Nous Research. Those pushing the limits on biological intelligence, overcoming hurdles long considered unsurmountable, could also instantiate a new narrative.

The most topical and directly observable way to spot the beginning of a shift is by tracking the drift of practitioners. Before any formal split, religious scholars and priests often privately maintained heterodox views while conforming in public. Today’s equivalent might be AI researchers who outwardly genuflect to scaling laws while secretly pursuing radically different approaches, waiting for the right moment to challenge the consensus or leave their labs for theoretically greener pastures.

The tricky part about both religious and technical orthodoxies is that they’re often partially right, just not as universally right as their strongest adherents believe. Just as religions captured fundamental human truths within their metaphysical frameworks, scaling laws clearly describe something real about how neural networks learn. The question is whether that reality is as total and unchanging as the current fervor suggests, as well as if these religious institutions (AI labs) can be malleable enough, strategic enough, to bring their zealots with them. And, at the same time, build the printing press (chat interfaces and APIs) that allows their knowledge to proliferate.

The end game

Religion is regarded by the common people as true, by the wise as false, and by the rulers as useful.

– Lucius Annaeus Seneca

A perhaps jaded view on religious institutions is that once they reach a certain scale, they, like many human-run organizations, are susceptible to bending to the incentives of survival in the hopes of outlasting competition. In the process, they neglect the incentives of truth and greatness (these aren’t mutually exclusive).

I’ve written extensively about how capital markets can become narrative-driven echo chambers, and incentives often align to perpetuate these narratives. The scaling laws consensus feels eerily similar – a deeply held belief system that’s both mathematically elegant and incredibly useful for coordinating massive capital deployment. Like many religious frameworks, it may be more valuable as a coordination mechanism than a fundamental truth.

Published by

Mario Gabriele

Founder and Editor of The Generalist

Michael Dempsey

Dreaming about the future and romanticizing the past | Managing Partner @compoundvc | investing in former science projects with a thesis-driven approach

The Generalist’s work is provided for informational purposes only and should not be construed as legal, business, investment, or tax advice. You should always do your own research and consult advisors on these subjects. Our work may feature entities in which Generalist Capital, LLC or the author has invested.

Join over 55,000 curious minds.

SUBSCRIBE FOR FREE

Join 130,000+ readers and get powerful business analysis delivered straight to your inbox.

No spam. No noise. Unsubscribe any time.