Insights
CPO

Why AI Is Slowing Down — And It’s Not a Computing Problem

Share:

The AI Bottleneck: Computation Heavily Outpaced Connection 

Modern AI models, such as ChatGPT or Gemini, have grown so massive that they cannot fit into the memory of a single GPU. To train or run these models, we must fragment them, splitting the workload across thousands of GPUs or within AI data centers. 

This turns AI into a network problem. These thousands of chips must act as a single "super-brain." If the connection performance between them compromises, the entire supercomputer slows down. 

 

Considering the usage cases of AI data center:
 

 

Training: Where We Need a Powerful Brain 

Training massive AI models is really a memory sharing challenge at its core. The datasets and model weights are just way too big to fit into the memory (VRAM) of a single chip, so we have to pull together the memory from thousands of GPUs into one big, shared resource.

To keep data moving between all these GPUs instantly, we need super high-bandwidth connections. But here’s the problem — we’re literally running out of physical space on the server faceplate to plug in all the cables needed to make that happen. And if the connections aren’t fast enough, the GPUs can’t grab the data they need in time, which brings training to a halt.

Inference: The Key is to Efficiently Distribute Workloads 

 

Once a model is trained, the focus shifts to cost efficiency. Running these models is extremely expensive, so the goal is to maximize the utilization of every GPU. You want the processor working 100% of the time, rather than waiting for data.

Workloads need to be distributed dynamically across chips with near-zero delay. Even a nanosecond of latency in the interconnect can cause costly processors to sit idle. This “dead time” consumes electricity without creating value, ultimately increasing the cost of every query.
 

Why Not Use Cheaper Copper Cables Solution? 

While copper works well for short distances, it hits a physical wall at next-generation speeds  

    1. The Distance Limit: As speed goes up, signals in copper degrade rapidly. The effective distance under high speed could be limited to less than 2 meters. This traps your AI cluster within a single rack. To scale beyond a rack, signal compensation IC must be used. 
    1. The Bulkiness Trouble: To carry high-speed signals, copper cables must be thick and shielded. Connection thousands of DAC is not easy to handle and also creates a physical wall that blocks airflow, causing servers to overheat. 1-3
Language
依據歐盟施行的個人資料保護法,我們致力於保護您的個人資料並提供您對個人資料的掌握。
按一下「全部接受」,代表您允許我們置放 Cookie 來提升您在本網站上的使用體驗、協助我們分析網站效能和使用狀況,以及讓我們投放相關聯的行銷內容。您可以在下方管理 Cookie 設定。 按一下「確認」即代表您同意採用目前的設定。
Manage Cookies

Privacy preferences

依據歐盟施行的個人資料保護法,我們致力於保護您的個人資料並提供您對個人資料的掌握。
按一下「全部接受」,代表您允許我們置放 Cookie 來提升您在本網站上的使用體驗、協助我們分析網站效能和使用狀況,以及讓我們投放相關聯的行銷內容。您可以在下方管理 Cookie 設定。 按一下「確認」即代表您同意採用目前的設定。
Privacy Policy

Manage preferences

Necessary cookie

Always on
網站運行離不開這些 Cookie 且您不能在系統中將其關閉。通常僅根據您所做出的操作(即服務請求)來設置這些 Cookie,如設置隱私偏好、登錄或填充表格。您可以將您的瀏覽器設置為阻止或向您提示這些 Cookie,但可能會導致某些網站功能無法工作。