The data centre industry is grappling with how to meet the demand for the training and deployment of generative AI and large language models (LLMs) as services are going mainstream and big tech is investing billions of dollars in the space.
While AI dedicated data centres are being deployed, non-traditional data centre markets are being seen as a good option.
LLMs such as OpenAI’s Chat GPT require a huge amount of energy and computing resource to be trained, but generally speaking training is less latency sensitive than other workloads more suited to traditional data centre markets.
The availability and cost of land and power in more remote areas is driving this decision making, but it may not be the case for long.
Inference (the deployment of a trained model on new data) tends to be faster and less compute intensive, meaning situating inference workloads in a more typical data centre hub could be beneficial.
“Right now, inference workloads have to be closely connected to the training module” Phill Lawson-Shanks, chief innovation and technology officer at Aligned Data Centers tells Capacity.
This is due to the need to connect the chips running the models via a technology known as InfiniBand, which offers high performance between chips, but limits the physical distance between them to just 50 metres.
But a recent breakthrough announced by Nvidia could facilitate de-coupling of training and inference workloads, which would boost AI deployments at the edge.
Nvidia’s BuleField-3 SuperNIC, is a novel network accelerator that works collaboratively with Nvidia’s Spectrum-4 Ethernet switch, forming the foundation of an accelerated computing fabric for AI, that can enable functional LLM to LLM and LLM to inference nodes over much greater distances.
“Extending the physical distance from just 50 meters to potentially crosscountry connections using network infrastructure technologies could be the enabler to edge-based AI/ML deployments, or at least remote LLM’s and regional inference engines,” Lawson-Shanks says.
There are benefits for smaller LLMs and AI tools being deployed at the edge of networks, as this could lead to quicker decision making for solutions that provide real time analytics and faster response times in general.
Lawson-Shanks defines the Edge as "the lowest network latency between a service and the consumption of that service - not the size of the data center building. You could have an 80MW data center located in a large metro area, hosting and delivering a cloud service to that market, and that is an edge deployment."
Data collected by Omdia, a global analysis and advisory firm, indicates that there is a correlation between traffic growth and AI, a significant portion of which is fuelled by video and high-resolution still images.
Network providers are trying to figure out how they can efficiently and sustainably scale to meet significantly increasing data demands driven by current and future technologies such as as IoT, content streaming, the metaverse and autonomous vehicles.
Removing noise from the network by deploying AI video analytics workflows at the edge could help solve this problem.