SenseTime hosted a Tech Day event, sharing their strategic plan for advancing AGI (Artificial General Intelligence) development through the combination of “foundation models + large-scale computing” systems. Under this strategy, SenseTime unveiled the “SenseNova” foundation model set, introducing a variety of foundation models and capabilities in natural language processing, content generation, automated data annotation, and custom model training. At the event, SenseTime not only showcased their large language model’s capabilities, but also demonstrated a series of generative AI models and applications, such as text-to-image creation, 2D/3D digital human generation, and complex scenario/detailed object generation. Additionally, they introduced their AGI research and development platform facilitated by the integration of “foundation models + large-scale computing” systems.
The current demand for computing power to train large models is extremely strong and continues to increase, yet useful infrastructure is quite scarce. Over the course of five years, SenseTime has built SenseCore, a leading AI infrastructure with 27,000 GPUs, capable of delivering a total computational power of 5,000 petaflops, making it one of the largest intelligent computing platforms in Asia. With the infrastructure’s capabilities, SenseTime has trained foundation models in various fields, such as computer vision, natural language processing, AI content generation, multimodality, and decision intelligence. The Company is continuously advancing its models’ capabilities to support various applications and demands.
Dr. Xu Li, Chairman and CEO of SenseTime, said, “In the era of AGI, the three elements of data, algorithms, and computing power are undergoing a new evolution. The number of model parameters will increase exponentially, and the volume of data will grow massively with the introduction of multimodalities, leading to a continuous surge in demand for computing power. We have built the infrastructure for the AGI era with SenseCore and named our foundation model set as ‘SenseNova’, implying ‘constant renewal, daily renewal, and further renewal’. We hope to continuously update the models’ iteration speed and their problem-solving capabilities, unlocking more possibilities for AGI.”
Prof. Wang Xiaogang, SenseTime Co-founder and Chief Scientist, said, “AGI has given rise to a new research paradigm, which is based on powerful foundation models, unlocking new capabilities through reinforcement learning and human feedback, therefore efficiently solving open-ended tasks. AGI will evolve from a ‘data flywheel’ to a ‘wisdom flywheel’, ultimately leading to human-machine symbiosis.”
“SenseTime has established a full-stack foundation model R&D system and has developed applications in multiple industries. The diversity of the scenarios, the complexity of the tasks, and the richness of the data, all demonstrate the capabilities and potentials of our foundation models. We will continue to promote infrastructure development and look forward to joining our partners in the tidal wave of the AGI era,” Prof. Wang added.
“SenseNova” offers various flexible API interfaces and services for enterprise customers, enabling them to access and utilize various AI capabilities of the SenseNova foundation models to their actual needs, with low barriers, low costs, and high efficiency.
“SenseNova” has also brought breakthroughs to SenseTime’s own business. For example, in the field of smart auto, based on the foundation model for computer vision (CV), SenseTime has achieved mass production of the BEV (Bird‘s-Eye-View) general perception that can recognize 3,000 types of objects. Moreover, they have built an integrated perception-decision multimodal system to enable better autonomous driving, with stronger environmental, behavioral, and motivational comprehension capabilities.
Natural language serves as a crucial means of communication between humans and machines. “SenseNova” has introduced “SenseChat”, the latest large-scale language model (LLM) developed by SenseTime. As an LLM with hundreds of billions of parameters, SenseChat is trained using a vast amount of data, considering the Chinese context to better understand and process Chinese texts. At the event, SenseChat demonstrated its capabilities in multi-turn dialogues and comprehending extensive texts. SenseTime also showcased several innovative applications powered by LLM, including a programming assistant to help developers write and debug code more efficiently, a health consultation assistant to provide personalized medical advice for users, and a PDF file reading assistant that can effortlessly extract and summarize information from complex documents.
Diffusion models have sparked the popularity of AIGC applications. SenseTime showcased various generative AI models and applications of “SenseNova”, such as text-to-image creation, 2D/3D digital human generation, and complex scenario/detailed object generation:
- “SenseMirage” text-to-image creation platform, showcasing powerful image capabilities with realistic lighting, rich details, and diverse styles, supporting 6K ultra-high-definition image generation. Customers can also train and finetune their own generative models tailored to their own styles.
- “SenseAvatar” AI digital human generation platform can create natural-sounding and -moving digital human avatars with accurate lip-sync and multi-lingual proficiency using just a 5-minute real-person video clip
- “SenseSpace” and “SenseThings” 3D content-generation platforms can efficiently and cost-effectively generate large-scale 3D scenes and detailed objects, providing new possibilities for metaverse and mixed reality applications.
Whether it is the large language model or text-to-image creation or digital human generation, they all require the large-scale computing power. SenseCore has industry-leading computing power output, ultra-large model training, and large-scale inferencing capabilities, and it targets to be the service leader in the AGI era.
Leveraging SenseCore infrastructure and “SenseNova” foundation models, SenseTime offers a range of Model-as-a-Service solutions to industry partners, encompassing automated data annotation, customized model training and finetuning, model inference deployment, and development efficiency enhancement:
- Automated data annotation based on pre-trained foundation models can achieve nearly a hundred times efficiency improvement compared to manual data annotation.
- Large-scale model training and finetuning services can help customers quickly train models using their own data, including the development of vertical models based on pre-trained foundation models.
- Model inferencing services can increase large-scale model inference efficiency by more than 100%, reducing the cost significantly.
- SenseTime also provides numerous pre-trained models and AI development toolkits to industry developers, empowering clients to enhance their development efficiency.
SenseTime will continue to advance the construction of the “SenseNova” foundation model set. Striving for “constant renewal, daily renewal, and further renewal”, SenseTime aspires to make ongoing improvements of the models in terms of data volume, parameter structure, and problem-solving capabilities. Together with industry ecosystem partners, SenseTime aims to advance breakthroughs in AGI, bringing the benefits of AI to everyone.
Source: SenseTime