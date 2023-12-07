Yann LeCun, a renowned AI scientist at Meta and a pioneer in the field, has shared his perspective on the current state of AI and its future trajectory. While acknowledging the value of textual data, LeCun believes that relying solely on it is not enough for AI systems to truly advance their intelligence. In a LinkedIn post, he highlights the remarkable ability of animals and humans to learn quickly with significantly less data compared to today’s AI systems.

LeCun draws attention to the limitations of large language models (LLMs), which are trained on vast amounts of text data that would take humans 20,000 years to read. Despite this massive input, these models still struggle with fundamental logical concepts, such as understanding that if A equals B, then B equals A. In stark contrast, animals like crows, parrots, dogs, and even octopuses exhibit a remarkable ability to quickly grasp such concepts, despite having fewer neurons and “parameters” in their cognitive systems.

According to LeCun, the future of AI lies in developing new models that can learn as efficiently as animals and humans. Instead of relying solely on text data, he suggests that incorporating sensory data, particularly video, will be a breakthrough for AI. Sensory data carries more information and structure, providing a richer learning experience for AI systems.

LeCun emphasizes that even a two-year-old child is exposed to a greater volume of visual data than the amount used to train LLMs. This visual data is invaluable because it contains redundancy, presenting the same information in various ways. Ultimately, this redundancy aids in the child’s understanding of the structure of the world.

Google has recently unveiled the Gemini AI model, a powerful system capable of processing multi-modal input, including video, audio, and text. LeCun believes that video data offers more opportunities for learning compared to text, thanks to its inherent redundancy and its ability to reveal the underlying structure of the world.

In conclusion, the future of AI lies in models that can learn as efficiently as animals and humans. By incorporating sensory data, particularly video, AI systems can tap into a wealth of information and improve their understanding of the world. This shift towards multi-modal learning is a promising step forward in the advancement of AI technology.