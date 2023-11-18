Meta Platforms Inc., the parent company of Facebook and Instagram, has made significant strides in the field of artificial intelligence-powered image and video generation. Their latest tools, based on Meta’s Expressive Media Universe (Emu), offer users more control over the image editing process through text instructions and introduce a groundbreaking method for text-to-video generation.

Emu, Meta’s foundational model for image generation, has already been put into production, enabling users to generate photorealistic images with Meta AI’s Imagine feature on Messenger. In the past, generative AI image generation often involved a trial-and-error process, where the user’s prompt didn’t always result in the desired image. Meta aims to eliminate this drawback with their new Emu Edit tool, which allows users to input text-based instructions for precise image manipulation. Whether it’s adding or removing backgrounds, applying color and geometry transformations, or performing various editing tasks like object detection and segmentation, Emu Edit delivers unprecedented control altering only the relevant pixels identified in the user’s instructions.

To ensure pixel precision, Meta’s researchers incorporated computer vision into the instruction process of image generation models. This innovative approach ensures that unrelated pixels remain untouched the edits made. For example, if a user wants to add the text “Aloha!” to a picture of a baseball cap, Emu Edit only alters the pixels around the text without modifying the cap itself.

In addition to image generation, Meta’s AI team has also dedicated efforts to enhancing video generation. Leveraging the abilities of the Emu model, Meta has developed Emu Video—a tool that simplifies text-to-video generation based on diffusion models. By creating an image conditioned a text prompt and then producing a video based on that image and another text prompt, Emu Video brings static images to life with movement. This “factorized” approach delivers highly efficient training for video generation models.

Meta’s new approach offers advantages over their previous methods. The simplified implementation of a pair of diffusion models enables Emu Video to produce a 512-by-512 four-second video at 16 frames per second. Through human evaluations, Meta has found that this new work surpasses their earlier image generation tools in terms of quality and faithfulness to the original text prompts.

The possibilities of generative AI image editing and video generation are vast. This technology can empower users to create their own animated stickers and GIFs effortlessly, revolutionizing personal expression. It also grants individuals the ability to edit their own photographs without relying on complex tools like Photoshop.

While these advancements are groundbreaking, Meta acknowledges that they won’t replace professional artists and animators. Instead, the technology enhances user creativity and self-expression providing new avenues for visual content creation.

FAQ:

Q: What is Emu Edit?

A: Emu Edit is a tool developed Meta Platforms Inc. that allows users to manipulate images precisely inputting text-based instructions.

Q: How does Emu Video work?

A: Emu Video, based on diffusion models, enables the generation of videos from text prompts. It creates an image conditioned a text prompt and then produces a video based on that image and another text prompt.

Q: What are the advantages of Meta’s new approach?

A: Meta’s new approach simplifies video generation using a pair of diffusion models, resulting in efficient training and high-quality videos. These advancements offer users unprecedented control in image and video editing.

Q: Will these tools replace professional artists and animators?

A: No, Meta’s tools aim to enhance user creativity and self-expression, rather than replace professional artists and animators. They provide new possibilities for visual content creation.