The Build on Trainium project will provide university researchers with compute hours to use the newly built UltraCluster made up of 40,000 Trainium chips to power their research on new AI architectures and machine learning libraries.
Subscribe today for free
Among the early users are researchers from Carnegie Mellon University, who are leveraging Amazon’s custom hardware to develop new compiler optimisations for AI, or in simple terms, methods for making AI software solutions run more efficiently.
“AWS’s Build on Trainium initiative enables our faculty and students large-scale access to modern accelerators, like AWS Trainium, with an open programming model. It allows us to greatly expand our research on tensor program compilation, machine learning parallelisation, and language model serving and tuning,” said Todd C. Mowry, a professor of computer science at CMU.
Further to the programme, Amazon said it would offer multiple rounds of Amazon Research Awards, offering grants that include Trainium credits and access to Trainium UltraClusters.
Grant recipients will have access to AWS's technical education and enablement programs for Trainium in partnership with the Neuron Data Science community, a virtual organisation led by Amazon's chip developer Annapurna.
“Trainium is beyond programmable — not only can you run a program, you get low-level access to tune features of the hardware itself,” said Christopher Fletcher, an associate professor of computer science research at the University of California at Berkeley, and a participant in Build on Trainium. “The knobs of flexibility built into the architecture at every step make it a dream platform from a research perspective.”
Amazon’s Trainium chips are part of its efforts to create its own supply of semiconductors, reducing its reliance on the likes of Nvidia.
The latest version, Trainium2, was unveiled in November 2023 and are designed to train foundation and large language models.
Alongside Trainium, Amazon’s other line of custom chips include Graviton, an Arm-based CPUs capable of powering cloud workloads, and the AI-focused Inferentia.
Anthropic, the AI startup behind Claude and OpenAI rival part-owned by Amazon uses chips like Trainium and Inferentia to power its training and inference workloads.
RELATED STORIES
Lumen partners with AWS to enhance AI, network and fibre capabilities
Amazon acquires Perceive for $80m
Amazon backs hyperscaler bet on nuclear reactors to fuel data centre growth