“Meta AI has unrolled a novel multimodal model combining text and image production and named it CM3leon.”
In the world of artificial intelligence or AI, every new day we see a new product is being evolved. Every tech company brings or works on bringing products that help users use the platform in a more sophisticated and enjoyable way.
With the same hope, Meta AI has unrolled a novel multimodal model combining text and image production and named it CM3leon. This multimodal model is the first of its type, using a modified formula from text-only language models to bring significant results with unparalleled computational efficiency.
What does CM3leon Do?
This new model produces text-to-images at a high rate while using five times less computing power than previous transformer-based techniques. While merging the adaptability and efficiency of autoregressive models, it maintains low training costs and high inference efficiency. CM3leon, as a causal masked mixed-modal (CM3) model, extends the capabilities of previous models by producing text and image sequences that are dependent on arbitrary sequences of other text and image content.
Architecture of CM3Leon-
The architecture of the new model, CM3Leon employs a decoder-only transformer similar to popular text-based models. What distinguishes CM3Leon is the capability to input and produce both text and images. This enables CM3Leon to handle a variety of tasks, such as prompt questions and model generation, with ease.
Generative models are becoming increasingly complex as these models get training on millions of sample photos to understand how there is a relationship between text and visuals, but they may also reflect any biases discovered in the training data. While AI-generated images have become more common thanks to popular tools such as Stable Diffusion, DALLE, and Midjourney, Meta AI’s approach to building and the performance it promises represents a significant step forward.