Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Media is best known as one of the main generators of IA images – with nearly 20 million users on its Discord channel, According to third -party trackersAnd probably more at the top of that on his website – but his ambitions are starting to develop.
After the news at the end of the summer of 2024, which he built his own computer equipment and AI, the company published this week a new research document alongside Automatic Learning Experts at New York University (NYU) on the formation of large -language models based on the text (LLMS) such as the eponymous Source of Meta of Meta and Mistral Mistral models, to write more creativity.
Collaboration, documented in a New research document Published on the community of the AI Hugging Face Code Community, introduces two new techniques – an optimization of diversified direct preferences (DDPO) and an optimization of diversified ratings (Dorpo) – designed to extend the range of possible outings while maintaining consistency and readability.
For a company which is best known for its models of generation of AI diffusion images, the new approach of Midjourney to rethink creativity in the LLM based on the text shows that it does not limit its ambitions to the visuals, and that an image may not be worth a thousand words.
Could a median-native LLM or a refined version of an existing LLM could be in the cards of the small bootstrapated start-up? I contacted the founder of Midjourney, David Holz, but I haven’t heard yet.
Regardless of the Midjourney LLM offer, the implications of his new research go beyond academic exercises and could be used to help fuel a new LLM training wave among business AI teams, product developers and content creators seeking to improve the text generated by AI.
This also shows that despite recent interests and investment between IA model suppliers in new multimodal language and reasoning models, there are still many juices to serve, cognitively and in terms of performance, from conventional LLMS based on transformers and texts.
The problem: the writing generated by AI collapses around homogeneous outings
In areas such as questions and answers based on facts or coding assistance, LLM should generate a single better answer.
However, creative writing is intrinsically open, which means that there are many valid responses to a single invite.
For an example provided by Midjourney researchers, given an invite as “Write a story on a dog on the moon”The LLM could explore several various paths such as:
- The company dog of an astronaut accidentally left behind a lunar mission.
- A dog that is found in a colony of futuristic canine space.
- A blocked dog that befriends an extraterrestrial species.
Despite this range of possibilities, the LLMs set by the instruction often converge on scenarios and similar themes. This happens because:
- Post-training techniques Hierarchy users’ preference on originality, strengthening popular but repetitive responses.
- The adjustment of instructions often smooths the variation, which means that the models promote “safe” responses to unique responses.
- Existing diversity promotion techniques (such as temperature adjustment) only work at the time of inference, rather than being cooked in the model learning process.
This leads to a homogenized narration, where the creative writing generated by AI is repetitive and lacks surprise or depth.
The solution: modify post-training methods to prioritize diversity
To overcome these limits, the researchers introduced DDPO and Dorpo, two extensions of the optimization methods of existing preferences. Basic innovation in these approaches is the use of deviation – a measure of the amount of response differs from others – to guide training.
Here’s how it works:
- During training, the model receives a writing prompt and multiple possible answers.
- Each response is compared to others for the same prompt, and a score is calculated.
- The rare but high quality responses are weighted more strongly in training, encouraging the model to learn from various examples.
By incorporating the difference in direct optimization of preferences (DPO) and optimization of dimension preferences (ORPO), the model learns to produce high quality but more varied responses.
This method guarantees that the stories generated by AI do not converge on a single predictable structure, but rather explore a wider range of characters, parameters and themes – just as a human writer could.
What Midjourney researchers have done to achieve this
The study involved LLM training on the tasks of creative writing using a set of data from the Subredit R / WritingPromps, a Reddit community where users publish prompts and respond with short stories.
The researchers used two basic models for their training:
- Meta’s Llama-3.1-8B (A model of 8 billion parameters of the LLAMA 3 series).
- Mistral-7b-V0.3 (A parameter model of $ 7 billion in Mistral AI).
Then they took these models through the following processes:
- Supervised end adjustment (SFT): The models were first refined using Lora (low -ranking adaptation) to effectively adjust the parameters.
- Optimization of preferences:
- DPO and ORPO were used as baselines– These standard methods focus on improving the quality of the response according to the preferably user signals.
- DDPO and DORPO were then appliedIntroduce a deviation -based weighting to encourage more unique responses.
- Assessment:
- Automatic evaluation: Semantic and stylistic diversity measured using techniques based on integration.
- Human evaluation: the judges evaluated whether the results were diverse and engaging in relation to GPT-4O and Claude 3.5.
Key training results:
- DDPO has significantly outperformed DPO standard In terms of production diversity while maintaining quality.
- LLAMA-3.1-8B with DDPO obtained the best balance of quality and diversity, producing answers that were more varied than GPT-4O while maintaining consistency.
- When the size of the data has been reducedDDPO models have always maintained diversity, although they need a number of diversified training samples to be fully effective.
Company implications: What does this mean for those who use AI to produce creative answers – as in marketing editorial staff, corporate narration and film / television / video game scripts?
For AI teams, LLM deployment management, improving the diversity of results while maintaining quality is an essential challenge. These results have important implications for organizations based on content generated by AI in applications such as:
- AI and conversational chatbots (Ensure various and engaging answers).
- Content and narration marketing tools (prevent the repetitive copy generated by AI).
- Game development and narrative design (Creation of a diversified dialogue and branched scenarios).
For professionals responsible for the adjustment and deployment models of adaptation in a business framework, this research provides:
- A new approach to LLM post-training that improves creativity without sacrificing quality.
- A practical alternative to adjusting diversity in inference time (such as temperature adjustments) by integrating diversity into the learning process itself.
- The potential to develop more engaging AI applications, writing tools assisted by AI to virtual assistants who can adapt their answers dynamically.
For those who handle the orchestration and automation of the AI model, this research highlights:
- The importance of setting models to the training stage, reducing the need for post-treatment adjustments to the deployment.
- A way to introduce adaptive narration into AI applications, ensuring variability while keeping the quality of the high content.
- A method to make LLM results more human, which is crucial for applications requiring interactive narration, customer commitment or dynamic content.
The future of creative projects generated by AI seems brilliant
The success of DDPO and Dorpo shows that the formation of LLM with diversity -oriented objectives can give significant improvements in creative writing. Some ideas include:
- Integrate learning based on deviation in corporate AI models To improve the diversity of responses in customer -oriented applications.
- Explore how these methods apply to other generative tasksLike poetry, scripting or playing of play.
- Develop hybrid training approaches this balance Diversity and instructions monitoring capacities For AI assistants.
For those who wish to apply these techniques, the researchers plan to make their code at the public disposition on this subject GitHub repository
Whether you are fine adjustment LLM for commercial applications or optimizing large -scale AI orchestration, this study provides usable information on how models can be more dynamic, engaging and reactive to creative tasks.
By adopting these techniques, AI teams can go beyond rigid and formulated outings – the construction of AI systems which are not only intelligent but also really imaginative.