In July 2023, YITU released the first multi-modal large model in the field of intelligent security that can be practically combated and commercially used - YITU QuestMindTM 1.0. Since its release, the YITU QuestMindTM LVLM base has completed two iterations and has been deployed in more than 50 projects nationwide.
Recently, at the 10th China (Shanghai) International Technology Import and Export Fair, YITU Technology officially released the latest version of the "YITU QuestMindTM LVLM 4.0", redefining the application boundaries of multi-modal large models in the field of intelligent security with a new interactive experience and ultra-high evolution capabilities.
The newly released YITU QuestMindTM LVLM 4.0 has achieved many cross-functional upgrades: integrating natural language and visual information, greatly improving the fuzzy retrieval capability of video content; supporting multi-condition combination scene control, achieving refined control and risk management; the pre-trained model of YITU QuestMindTM 4.0 supports algorithm cold start with very few samples, and truly achieves the intelligent leap of "idea is algorithm" through Agent auxiliary training.
-
Video understanding is more subtle, semantic retrieval is richer
The fuzziness of language interaction comes from the diversity of context. YITU QuestMindTM 4.0 introduces multi-modal visual search technology, integrating natural language and visual information, focusing on "users", deeply understanding the subtle differences in context, for example: when you need to search for video content of "riding an electric car with multiple gas cylinders", users only need to describe the requirements in daily language, and the system can present the search results closest to the intent. At the same time, it can also perform fuzzy retrieval for small targets in video content. These capabilities greatly improve the work efficiency of city managers in daily operations and decision-making scheduling, reducing communication costs.
(Above figure example: "Cars with broken headlights", the system can not only understand abstract descriptions but also quickly feedback precise image results)
-
Comprehensive understanding of all elements, more comprehensive multi-condition control
High-precision video content understanding unlocks the possibility of complex video scene control. Machines can replace people to watch videos, understand videos like people, and understand video content in full scenes and all elements, which can accurately control typical scene targets and rules, provide early warning of potential risks, and assist decision-making scientifically and efficiently. YITU QuestMindTM 4.0 supports multi-condition combination scene control, which can help managers carry out refined risk prevention and control and management. In the fields of city management, environmental monitoring, and public security, this technology has shown high practical application value.
(Above figure example: The city camera searches for historical events of "severe water accumulation in culverts")
-
Fewer samples, more efficient, more flexible on-site training
A major feature of intelligent systems is that they can quickly adapt to changes in the environment and needs. Traditional machine learning models need to re-collect data and train models when facing new algorithm tasks, which takes at least 1-3 months. YITU QuestMindTM 4.0 upgrades the pre-trained model, which can achieve cold start of new algorithms with very few samples within 1 minute, complete online annotation training within 1 hour, and quickly deploy online within 1 day. Through the rapid accumulation of data flywheel in daily work, operators can spend a few minutes every day aligning data and simply clicking right and wrong, and in a few days, the algorithm can achieve an accuracy rate of over 90%, showing unprecedented intelligence and flexibility, fully meeting the agility of business systems and the timeliness of management.
(Above figure example: Investigate "Fierce Dogs", align a few samples of fierce dogs, Chihuahuas, Labradors, and rural dogs rarely appear)
-
Ideas are algorithms, Agent assistance is smarter
Agents play a crucial role in the multi-modal large model system. AI Agents can make more accurate decision assistance based on historical interaction records and existing algorithm capabilities. YITU QuestMindTM 4.0 can assist in gradually aligning cognition, deconstructing and reorganizing algorithms. For example: when we want to train a "small forklift in a large warehouse", the Agent will align the semantics of "large warehouse" and "small forklift", so that users' ideas can quickly be transformed into intuitive algorithms, allowing every idea of users to be instantly transformed into intuitive operational instructions, achieving the leap of "idea is algorithm", showing the flexibility and efficiency of a work assistant and an intelligent body.
(Above figure example: Training "small forklift in a large warehouse", Agent aligns the semantics of "large warehouse" and "small forklift")
AI New Era, because we see, we believe!
Since 2019, YITU has started research and application exploration based on Transformer. In 2020, YITU launched the pre-trained language understanding model ConvBERT, which obtained the same accuracy as Google's BERT model with only 1/10 of the training time and 1/6 of the parameters, compared with OpenAI's GPT-3, it can explore the training of language models in less time, and also reduce the computational cost of the model when predicting. In July 2023, YITU QuestMindTM Large Vision Langauge Model was officially released and quickly deployed in national projects.
The working paradigm of YITU QuestMindTM Large Vision Langauge Model has been upgraded from pixel annotation of traditional deep learning to the representation alignment of multi-modal large models. Through the deep integration of vision and language models, it has unified the underlying framework of the physical and cognitive worlds, building a bridge between the physical and cognitive worlds, and achieving a perfect connection between user needs and technological innovation. The newly released YITU QuestMindTM 4.0 has iterated new features in anthropomorphic interaction, situational understanding, and cognitive evolution, enhancing the multi-modal large model's ability to understand and discover complex video content.
YITU seeks a new decade, in the vertical field of vision, as engineering applications gradually land, the complexity of content understanding continues to increase, target features, relationship features, spatial features, behavioral features, statistical features, knowledge features, and business reasoning continue to unlock. The continuous breakthrough in the theoretical basis of multi-modal large models also allows us to see the possibility of unlocking more application scenarios.
We firmly believe that in the field of intelligent security, multi-modal large models will exert greater potential, especially in complex scenarios with strong personalized needs and variable environments, which will show greater commercial and social value. Intelligent operation based on data and computing power will become the new normal for public security and urban governance, and all industries will truly enter the new era of artificial intelligence with the breakthrough development of technology.
You can copy the link to share to others: http://m.onedayslife.com/node/920
Related articles
-
YITU introduces its large-scale dataset PreCo for coreference resolution at EMNLP 2018 Brussels
-
YITU TECH. Opens First International Office in Singapore
-
YITU Showcases Smart City Solutions at UNIDO Event to Enhance Global Urban Livability
-
YITU Technology accelerates AI innovation with the launch of the YITU Singapore research & development center