CPO

Why AI Is Slowing Down — And It’s Not a Computing Problem

Share：

The AI Bottleneck: Computation Heavily Outpaced Connection

Modern AI models, such as ChatGPT or Gemini, have grown so massive that they cannot fit into the memory of a single GPU. To train or run these models, we must fragment them, splitting the workload across thousands of GPUs or within AI data centers.

This turns AI into a network problem. These thousands of chips must act as a single "super-brain." If the connection performance between them compromises, the entire supercomputer slows down.

Considering the usage cases of AI data center:

Training: Where We Need a Powerful Brain

Training massive AI models is really a memory sharing challenge at its core. The datasets and model weights are just way too big to fit into the memory (VRAM) of a single chip, so we have to pull together the memory from thousands of GPUs into one big, shared resource.

To keep data moving between all these GPUs instantly, we need super high-bandwidth connections. But here’s the problem — we’re literally running out of physical space on the server faceplate to plug in all the cables needed to make that happen. And if the connections aren’t fast enough, the GPUs can’t grab the data they need in time, which brings training to a halt.

Inference: The Key is to Efficiently Distribute Workloads

Once a model is trained, the focus shifts to cost efficiency. Running these models is extremely expensive, so the goal is to maximize the utilization of every GPU. You want the processor working 100% of the time, rather than waiting for data.

Workloads need to be distributed dynamically across chips with near-zero delay. Even a nanosecond of latency in the interconnect can cause costly processors to sit idle. This “dead time” consumes electricity without creating value, ultimately increasing the cost of every query.

Why Not Use Cheaper Copper Cables Solution?

While copper works well for short distances, it hits a physical wall at next-generation speeds

1. The Distance Limit: As speed goes up, signals in copper degrade rapidly. The effective distance under high speed could be limited to less than 2 meters. This traps your AI cluster within a single rack. To scale beyond a rack, signal compensation IC must be used.

1. The Bulkiness Trouble: To carry high-speed signals, copper cables must be thick and shielded. Connection thousands of DAC is not easy to handle and also creates a physical wall that blocks airflow, causing servers to overheat.