A mechanical hand is on show on the Robotic Mall, world’s first embodied clever robotic 4S retailer, on August 13, 2025 in Beijing, China.
Vcg | Visible China Group | Getty Photos
BEIJING — Alibaba Cloud is investing in a brand new sort of synthetic intelligence designed to raised replicate the actual world utilizing a unique strategy from chatbots similar to OpenAI’s ChatGPT.
The shift acknowledges the boundaries of “massive language fashions” educated totally on textual content. As an alternative, builders are beginning to focus extra on “world fashions” constructed on movies and real-life bodily situations.
To leap on the pattern, Alibaba led a 2 billion yuan ($290 million) funding in ShengShu, the startup behind the AI video technology software Vidu, the corporate introduced Friday. TAL Schooling and Baidu Ventures additionally participated within the sequence B funding spherical.
The funding comes about two months after ShengShu raised 600 million yuan from Qiming Enterprise Companions and different backers. The startup declined to reveal its valuation.
ShengShu mentioned the most recent funding will assist the event of a “common world mannequin” that makes use of AI to bridge two presently separate domains: the digital world of video games and AI-generated video, and the bodily world of autonomous driving and robots.
“ShengShu believes {that a} common world mannequin, constructed on multimodal knowledge similar to imaginative and prescient, audio, and contact, extra naturally captures how the bodily world works than massive language fashions,” the three-year-old startup mentioned in an announcement.
“We purpose to attach notion and motion,” Zhu Jun, founding father of ShengShu, added in an announcement, permitting AI techniques to higher mannequin and predict real-world habits persistently.
ShengShu’s newest Vidu Q3 Professional mannequin, launched in January, ranks among the many high 10 AI fashions for producing movies from textual content and pictures, in line with Synthetic Evaluation.
The corporate launched Vidu globally months earlier than OpenAI made its now-shuttered Sora software for AI video technology extensively accessible. Chinese language short-video corporations Kuaishou and ByteDance have additionally launched comparable competing AI instruments for producing movies.
World mannequin competitors
Alibaba has expanded its investments in associated startups.
The Chinese language tech big and Baidu Ventures final month led a $50 million funding in Tripo AI, a platform that makes use of AI to rapidly generate digital 3D fashions from images. Tripo mentioned additionally it is transferring away from methods utilized by language fashions towards AI instruments grounded in bodily house and is creating its personal world mannequin.
In September, Alibaba additionally led a $60 million funding in PixVerse, which launched an AI world mannequin earlier this yr that enables customers to direct how a video unfolds whereas it’s being generated.
Alibaba, which received its begin in e-commerce, has additionally launched free, open-source AI fashions for video technology and, in February, launched one for powering robots.
Shengshu mentioned Friday it has strategic partnerships with corporations creating embodied AI — techniques similar to humanoid robots that work together with the bodily world — to be used throughout industrial, industrial and residential settings.
World fashions are important for robotics as a result of the expertise wants greater than LLMs to work, Kevin Kelly, co-founder of the U.S. tech journal Wired, wrote final month on his Substack.
In the end, to copy human intelligence, AI will want three issues: reasoning, an understanding of the bodily world and steady studying, Kelly mentioned. Whereas AI for the educational class hasn’t been developed but, LLM-powered chatbots have created the information factor, he mentioned, making world fashions a key space requiring a breakthrough.












