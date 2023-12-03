Artificial intelligence (AI) has undoubtedly made significant strides in recent years, but according to Yann LeCun, chief scientist at Meta (formerly Facebook), human-level AI is still decades away. LeCun believes that current AI systems lack true sentience and common sense that can take their capabilities beyond text summarization. While some industry leaders, such as Nvidia CEO Jensen Huang, predict that AI will surpass human performance in less than five years, LeCun suggests that these claims may be driven the ongoing “AI war” and the desire to sell more GPUs.

Instead of focusing on the race towards human-level AI, Meta is exploring the development of “cat-level” or “dog-level” AI, which could emerge much sooner. LeCun argues that the technology industry’s emphasis on language models and text data is insufficient for creating advanced human-like AI. Text alone is a limited source of information, and training systems on vast amounts of text does not guarantee a comprehensive understanding of the world. LeCun highlights the need to leverage multimodal AI, combining audio, image, and video information, to unlock the true potential of AI systems.

Meta is actively researching and tailoring transformer models, such as those used in its ChatGPT app, to work with different kinds of data. The goal is to discover hidden correlations between various types of data, enabling AI systems to perform more complex tasks. For instance, Meta’s Project Aria augmented reality glasses use multimodal AI to provide visual cues for improving tennis skills.

While companies like Meta and Google parent Alphabet invest in advanced AI models, Nvidia remains a prominent player in the field. Nvidia’s graphics processing units (GPUs) have become the industry standard for training AI models. LeCun acknowledges the importance of GPUs in AI but envisions the future with new chips specifically designed for deep learning acceleration.

Regarding quantum computing, which has attracted significant investment from tech giants like Microsoft and IBM, LeCun and Meta’s senior fellow, Mike Schroepfer, express skepticism. They question the practical relevance and the feasibility of fabricating useful quantum computers in the near future. Instead, Meta remains focused on commercializing AI within the coming years.

In conclusion, the path to human-level AI may be far off, but the potential of cat-level and dog-level AI, powered multimodal models, presents exciting opportunities for advancements in various fields.

FAQs

What is multimodal AI?

Multimodal AI refers to the combination of different kinds of data, such as text, audio, image, and video, to develop AI systems with a more comprehensive understanding of the world.

Why is text alone considered a poor source of information for AI?

Text provides limited information compared to the breadth of data available in other formats. AI systems trained solely on text may lack a deeper understanding of complex concepts and real-world connections.

What are the advantages of developing multimodal AI systems?

Multimodal AI can enable AI systems to perform complex tasks leveraging the correlations between different types of data. It allows for more comprehensive and nuanced understanding, leading to more sophisticated AI applications.

Why is Nvidia’s GPU technology essential in AI?

Nvidia’s graphics processing units (GPUs) have become the industry standard for training AI models due to their high computational power and parallel processing capabilities.

Why are there doubts about the practicality of quantum computing for AI?

Despite the investment in quantum computing, some experts, like Yann LeCun and Mike Schroepfer, doubt its practical relevance in the near future. They believe that many problems solvable with quantum computing can still be efficiently addressed using classical computers.