In artificial intelligence (AI), especially within deep learning and large-scale data processing, maintaining memory coherence is critical. AI models often rely on extensive parallel processing, where different parts of the model or separate data batches are processed simultaneously by multiple processors.
If memory coherence isn't maintained, it can lead to errors, incorrect computations, or inefficiencies. So, how do key players in the AI chip industry, like Nvidia, TSMC, and MediaTek, address their memory coherence challenges?
During large-scale AI training, matrix multiplications are divided among multiple GPUs for parallel processing. Once the computations are complete, the data is retrieved and integrated to produce the final result. This process presents a significant memory coherence challenge.
"You can distribute tasks to all these GPUs, even if they're located in different places, but there's always a delay when they return," said Hamid Arabzadeh, CEO of Ranovus, a Canadian Silicon Photonics company, at the Canada-Taiwan Co-Innovation Forum. "Each GPU returns data at a different speed, and you must wait for all the results to arrive before you can proceed."
Before this data can be integrated, finding a more efficient way to distribute and receive it is essential to make the process more real-time.
"When systems were centralized and monolithic, there was a lot of point-to-point integration, where batch processing and polling were the norms for data retrieval," explained Floyd Davis, VP of Solution Engineering at Solace, a Canadian middleware company. "These methods are complex, fragile, and ultimately slow. We need to decentralize these systems, making them loosely coupled, real-time, and event-driven."
To ensure memory coherence, synchronization mechanisms like locking, barriers, or specialized hardware protocols (e.g., MESI - Modified, Exclusive, Shared, Invalid) are employed to manage access to shared data, guaranteeing that updates are consistently propagated across the system.
"But in the meantime, power consumption can be a concern. The real issue lies at the system level," Arabzadeh emphasized, highlighting the need for a system-level approach to solve this challenge.
Memory coherence is closely linked to cache coherence, which ensures that multiple copies of the same data in different caches (small, fast storage areas near the processors) remain consistent. Without cache coherence, processors might work with outdated or inconsistent data.
Enter Stathera, another Canadian company contributing to the solution with its advanced silicon timing devices. Timing devices not only track time but also synchronize electronics and telecommunications. Stathera uses foundries to build its cutting-edge timing devices using semiconductor technology. CEO George Xereas said 80% of its products are made in Taiwan while 20% are made in North America.
As telecommunications advances to 6G, traditional quartz crystal timing devices will face physical limitations in size and speed. Stathera, which has received investment from MediaTek and another major semiconductor company they have not disclosed, aims to launch its first-generation product in the first quarter of 2025. Having successfully raised US$15 million for its series A funding, Stathera is ready to commercialize its products.
Davis noted that companies like Nvidia and Apple are already adopting event-driven data integration, signaling a shift towards more responsive and efficient AI processing in their supply chains. As industry leaders embrace these innovations, it's only a matter of time before others follow suit, further driving the evolution of AI technology and reinforcing the importance of memory coherence in achieving seamless, real-time processing across complex systems.