Comment by 🛞 MaAkThRsYoOySrHtKaAm

Re: "Energy Requierment for Large Language Models and What is…"

The energy issue might be a much bigger topic than most people give attention to. A lot of international politics right now is revolving around resources required for running these LLMs. It does seem that plateau in their capabilities has already arrived and that the only real solution for advancement is to feed more computing power to the beast.

The US wants to prevent Nvidia from selling products to China for 30 month to gain some headstart in building a larger infrastructure. International deals are being made with countries that have unique access to rare earth materials. You've got companies running proprietary LLMs purchasing old nuclear power plants. Micron shutting down their retail production lines for RAM to shift toward catering to these AI companies. It seems this tool has become indispensable already and has been deemed so important that it is steering policy for large corporations and nations.

WHY DOES AI CONSUME SO MANY RESOURCES?

The context window, the amount of input and output a model can handle at once, might be the most significant bottleneck present in LLM design. More tokens in means disproportionately more computation and memory. It's not just twice as much data in requires twice as much computation and memory. The scale is exponential. It does not scale linearly. So the actual context window really requires much greater hardware compute capabilities to expand and the more it expands, it needs exponentially more resources.

It's funny, just looking it up right now I find:

=https://shreyassk.substack.com/p/scaling-transformer-context-the-on Scaling Transformer Context: The O(n²) Bottleneck

That is a good, quick rundown for the formula for scaling to handle larger prompts. Definitely go read that real quick and realize how fucked things are. I'm a bit disgusted by the focus on interviewing, but that's not the point.

I'm watching the industry implement solutions similar to what I've had to do as a user. Record persistent memory outside of the context window so it can be referred to later. Offload compute to an MCP where possible and reduce the input sent to the LLM. Cache it in RAM for faster retrieval. Plenty of other "tricks" and they make a difference. However, it all seems to be a matter of efficiently organizing and reducing the tokens going in so at to avoid hitting the limit of the context window.

So how do you expand the context window? For now it seems, you throw more compute resources at it. I am sure they are working on a myriad of methods, algorithms, compression, and whatever else to enhance the context window. Can they scale computing resources by any means to be anywhere near the O(n²) rate of demand? No. Even that Substack describes how Sliding window attention works. It's just more witchcraft.

If you have a large PDF, hell even an average PDF, that you want the LLM to work with, you can't just feed the whole PDF into the prompt and expect that the LLM can handle that. You're going to have to feed a few pages in at best, get the output you need from the LLM in as concise and brief a format as you can and record that. Then move on to the next few pages. As far as I can tell, that's the approach they are taking there. They are performing the equivalent of moving the "attention" around to whichever few pages of the PDF are needed for the CURRENT context. They're just doing it with

I don't know everything of course. Would be glad to be proven wrong when I say the only thing we can do at the moment to expand the context window is to throw more resources at it. Every solution I've seen so far comes down to making prompts as efficient as possible by some means or another. They are avoiding hitting the wall, but we still need to move that boundary further out. The only solutions that directly expand the context window involve scaling hardware. The problem with that is obvious when you consider that hardware resources are limited. We are going to hit those limits. If this tool does not give researchers and developers a big enough boost in discovering real solutions in overcoming these limitations, it will mean a whole lot of resources spent on nothing.

When I say "A whole lot" I mean like:

Listen, if you wanted to join the PFJ you'd have to really hate the Romans.

I do!

Oh yeah? How much?

A lot!

All current solutions are energy-intensive. Not to mention other resources required. The more I examine this, the more I am convinced that the bulk of the capacity of current tools should be spent on overcoming these limitations now before it is too late.

If humans can be intelligent then there must be a way to mirror that. The brain has many different systems functioning to perform well at certain things. Perhaps having a cluster of models each dedicated to performing their scope of function in conjunction with one another is the most efficient use of hardware resources. More efficient hardware or hardware tailored to the function it will serve would also help. As that sort of research develops, the entirety of the infrastructure must be examined and improved upon.

Energy Generation

- Expanded nuclear fission

- Fusion development

- Hardened + expanded electrical grids

- Co-sited power for data centers

Semiconductor Materials & Fabrication

- Post-silicon substrates (GaN, SiC, graphene, CNTs)

- Advanced HBM scaling

- New interposers and advanced packaging

- Critical-mineral availability (gallium, germanium, indium, rare earths)

Cooling & Thermal Management

- Direct liquid cooling

- Immersion cooling

- Cryogenic computing

- High-efficiency thermal interfaces

Specialized Compute Architectures

- Optical/photonic compute

- AI-specific ASICs

- In-memory and near-memory compute

- Hybrid architectures (SSM + attention, etc.)

Raw Materials & Supply Chains

- Mining and refining expansion

- Diversified rare-earth and metal processing

- Semiconductor-grade material sourcing

- Recycling pipelines for critical elements

Fundamental Physics & Next-Gen Compute

- Superconducting logic

- Neuromorphic computing

- Quantum-assisted components

- Ultra-low-energy switching materials

Oh and don't forgot the pollution that comes with refining ore for rare earth materials. Where is that going to take place if not China? Can we get around the waste? We're looking at some big problems and we need to get on it.

🛞 MaAkThRsYoOySrHtKaAm

2025-12-06 · 5 months ago

🌒 s/AI

☕️ Aptor-theHobbit: [mod]

— From Words to Watts

Energy Requierment for Large Language Models and What is the Next Step for the LLMs ? — Growth in the LLMs is very rapid. Training of such large models is only possible for the big corporations. The amount of energy required for such training is so huge that it can be equivalent to electricity consumption of a major city for a month. In order to advance these models further, even larger infrasture...

💬 5 comments · 1 like · 2024-09-05 · 2 years ago · #energy #LLMs #training