SAEDNEWS: Tencent has released HunyuanWorld-Voyager, an artificial intelligence model designed to transform a single photo into video sequences that simulate three-dimensional environments.
The system generates RGB video alongside depth information, enabling users to navigate virtual spaces without traditional 3D modeling. Tencent says the tool allows camera movements such as forward, backward, and rotational paths, producing short clips that can be linked into longer sequences.
Unlike true 3D models, the program creates 2D video frames that maintain spatial consistency, giving the impression of moving through a 3D world. Each output generates 49 frames, or about two seconds of footage, with the option to chain clips for several minutes of content. Depth data can also be converted into 3D point clouds for reconstruction.
Voyager relies on a “world cache,” which stores 3D points from earlier frames and projects them back into 2D, ensuring that new frames remain consistent with previous outputs. Tencent researchers said the approach improves spatial stability compared to existing generators, though errors accumulate during longer or complex movements.
The training process involved more than 100,000 video clips, including scenes from the Unreal Engine, teaching the model to mimic camera behavior in 3D environments. Tencent acknowledged that, like other Transformer-based systems, Voyager is still pattern-driven and limited in its ability to generalize beyond training data.
Comparable systems are being developed by other firms. Google’s Genie 3, announced in August 2025, generates interactive worlds from text prompts, while Dynamics Lab’s Mirage 2 enables users to convert photos into playable spaces online. Voyager, by contrast, is geared toward video production and 3D reconstruction.
Running the system requires at least 60GB of GPU memory for 540p resolution, with 80GB recommended for better results. Tencent has published the model weights on Hugging Face, but its license restricts usage in the European Union, the United Kingdom, and South Korea. Deployments serving over 100 million users need additional approval from the company.
On Stanford University’s WorldScore benchmark, Voyager reportedly achieved the highest overall ranking of 77.62, outperforming rivals WonderWorld and CogVideoX-I2V in most categories, though placing second in camera control.
Despite promising benchmark results, high computing demands and limits in producing long, coherent scenes mean the technology is not yet suited for real-time gaming or large-scale use. Tencent positioned Voyager as a step forward in AI-based video generation and reconstruction, though widespread deployment remains some way off.