SIGGRAPH 2024 felt especially like a celebration party hosted by Nvidia, which has been sponsoring SIGGRAPH every year since 2020. Nvidia's presence was everywhere at this annual conference centered around computer graphics: 20 papers at the intersection of generative AI and simulation, Gen AI Day, developer education sessions, OpenUSD announcements… the list goes on.
Nvidia's CEO Jensen Huang first had a fireside chat with Lauren Goode, senior writer at WIRED. During the thirty-minute break, Huang grabbed a microphone and asked the crowd to ask him anything.
While many PhDs tried to find a good angle to take selfies with Hung speaking in the background, Huang discussed how the latest chip design increases inference speed, and how synthetic data, domain randomization, and reinforcement learning help push the frontier even when the industry runs out of human-generated data. These all accumulated to Huang, showing up as a host, welcoming Mark Zuckerberg, Meta's CEO to the stage for a conversation on the next computing platform.
Meta, Google, Microsoft, and Amazon have been increasing capital spending that benefits Nvidia. Dan Gallagher, a Columnist for the Wall Street Journal asserted that "It's Nvidia's market. Everyone else just lives in it—though not nearly as well." (See Big Tech's AI Race Has One Main Winner: Nvidia, published in the same week where SIGGRAPH took place.)
The three waves
This is just the beginning for Nvidia. At SIGGRAPH, Huang summarized the first and the second wave of AI and described the third wave. "The first wave is accelerated computing that reduces energy consumed, allows us to deliver continued computational demand without all of the power continuing to grow with it," Huang said, the second wave is for enterprises, ideally every organization, to create their own AIs. "Everybody would be augmented and to have a collaborative AI. They could empower them, helping them do better work."
The next wave of AI is "physical AI", according to Huang. He used the Chinese science fiction "The Three-Body Problem" to introduce the concept. He called it "the three computer problem" where a computer creates the AI, another computer simulates the AI using synthetic data generation, or having humanoid robots refine its AI. Last but not least, another computer that runs the AI.
"Each one of these computers, depending on whether you want to use the software stack, the algorithms on top, or just the computing infrastructure, or just the processor for the robot, or the functional safety operating system that runs on top of it, or the AI and computer vision models that run on top of that, or just the computer itself. Any layer of that stack is open for robotics developers." Huang said.
The infrastructure layers of AI have been covered by big tech companies, with Nvidia leading in from the hardware side, and now more so than ever before on the software side as well. "We've always been a software company, and even first," said Huang. Nvidia set the industry standard back in 2006 with the introduction of CUDA which speeds up computing applications with GPUs. Now Nvidia's roots are not just expanding but also deepening.
OpenUSD
"One of the formats that we deeply believe in is OpenUSD," Huang said. To improve interoperability across 3D tools, data, and workflows, Nvidia co-founded the Alliance for OpenUSD (AOUSD) along with Pixar, Adobe, Apple, and Autodesk in 2023 to promote USD (Universal Scene Description).
USD enables the assembly and organization of any number of assets into virtual sets, scenes, and shots, transmits them from application to application, and non-destructively edits them. An analogy can demonstrate the significance of USD: if the traditional file types .FBX and .OBJ are like .PNG and .JPEG, USD is similar to Adobe Photoshop's project file .PSD. USD's open-sourced layering system means each operation can be turned on and off. Production doesn't need to be done linearly with USD, which is ideal for small teams and even more tempting for a matrix organization. It can load complex data without having teams worried about data management.
"OpenUSD is the first format that brings together multimodality from almost every single tool and allows it to interact, to be composed together, to go in and out of these virtual worlds." Huang claimed that over time, developers can "bring in just about any format" into USD. Nvidia announced that the unified robotics description format (URDF) now is compatible with USD. Pixar was the original inventor of USD and proved that it can be used to optimize 3D workflows; Apple and other companies have been active contributors to its development. The product releases Nvidia had at SIGGRAPH this year provide the developer community with even more reasons to be part of the ecosystem.
"We taught an AI how to speak USD, OpenUSD." Huang introduced the USD Code NIM microservice, USD Search NIM microservice, and USD Validate NIM microservice. "Omniverse generates the USD and uses USD search to then find the catalog of 3D objects that it has. It composes the scene using words and then generative AI uses that augmentation to condition the generation process and so therefore, the work that you do could be much, much, better controlled."
During the developer meetup organized by Nvidia, engineers from IKEA, Electronic Arts, and a European surveillance company all showed interest in potentially switching to USD. Depending on different projects and the supporting tools, USD could speed up creation from months to days. Even in some unique situations, when enterprises do not want the files to be opened by some applications, they can still write a converter to do so while getting the benefits of interoperability before files get to the converter.
NIM microservices's early adopters and use cases are growing: Foxconn created a digital twin of a factory under development; WPP brought the solutions to their content creation pipeline that is built on Nvidia Omniverse, serving their client, The Coca-Cola Company.
Partnership
Other than interoperability, the lack of assets can be another showstopper for enterprise 3D adoption. There are several text-to-3D solutions out in the market, but most were trained with data on the open internet which has legal implications. Nvidia partnered with Shutterstock and Getty Images to help each of them train their own text to 3D model with their respective proprietary data. These models are accessible through Nvidia Picasso, an AI foundry for software developers and service providers to build and deploy generative AI models.
With the advancement of model output and industrial awareness, Shutterstock predicted a "near-universal adoption" of generative AI in creative industries ranging from graphic design, 3D modeling, animation, video editing, and concept art.
Author's bio
Kari Wu is a Senior Technical Product Manager at Unity Technologies, the leading platform for creating real-time interactive 3D content. Previously, she was an entrepreneur focused on augmented and virtual reality. Kari founded FilmIt, a startup enabling users without formal training to film professional-looking videos using augmented reality and automated editing solutions.
Born in Taiwan and raised across cultures, Kari brings a global vision to her work. Her experience consulting businesses in South Korea and Taiwan, and living in Boston, Los Angeles, and San Francisco, gives her a unique ability to see from multiple perspectives. From immigrant to entrepreneur to tech leader, Kari's lifelong curiosity has driven her journey of evolving identities and careers. Kari earned her MBA and MS in Media Ventures from Boston University.
Kari Wu