IndustryTechCrunch AI·

ElevenLabs’ new music-generation model can switch genres mid-track

ElevenLabs unveils a pioneering music generation model featuring mid-track genre switching and precise regional regeneration for professional audio workflows.

By Pulse AI Editorial·3 min read
Share
AI-Assisted Editorial

This article is original editorial commentary written with AI assistance, based on publicly available reporting by TechCrunch AI. It is reviewed for accuracy and clarity before publication. See the original source linked below.

The landscape of generative audio has shifted significantly with ElevenLabs’ introduction of its latest music-generation model. Unlike previous iterations of AI music tools that functioned largely as "black boxes"—taking a text prompt and returning a finished, immutable audio file—this new architecture introduces granular control. The headline feature is the ability to regenerate specific sections of a track without disturbing the surrounding composition. More impressively, the model allows for mid-track genre transitions, enabling a song to evolve from a classical arrangement into a heavy metal riff or an electronic dance beat seamlessly within a single timeline.

This development follows a period of rapid, albeit messy, growth for AI music. Over the last year, platforms like Udio and Suno captured public attention by demonstrating that AI could produce radio-quality vocals and instrumentation. However, these tools often faced criticism from professional creators for their lack of "editability." In the traditional recording world, a producer might like a bridge but hate the chorus; until now, AI music often required a total reroll of the dice to fix a single bar. ElevenLabs, which built its reputation on high-fidelity voice cloning and text-to-speech, is now positioning itself as the bridge between casual generative fun and professional-grade production utility.

At the heart of this new model is a sophisticated approach to "inpainting," a technique long used in AI image generation to alter specific pixels while maintaining the overall context of the picture. Transferring this to audio is technically daunting because of the temporal nature of sound—rhythm, key, and tempo must remain coherent even when the genre shifts. The ElevenLabs engine manages this by treating the audio waveform as a malleable set of parameters. By allowing users to isolate a segment of the waveform, the model can "listen" to the preceding and succeeding bars to ensure that any new generation fits the structural backbone of the piece, even if the stylistic skin is completely replaced.

The business and legal implications of this advancement are vast. We are moving away from the era of "one-shot" generation and toward a future of iterative AI co-creation. For the gaming and film industries, this offers a streamlined workflow for dynamic scores that react to on-screen action. However, this level of control also intensifies the ongoing debate regarding copyright and data provenance. As ElevenLabs pushes further into the music space, it faces the same scrutiny as its peers regarding the datasets used to train these versatile models, especially as the technology becomes capable of mimicking nuanced genre signatures with greater precision.

From a competitive standpoint, ElevenLabs is signaling that it intends to own the entire "audio stack." While they started with synthetic voices, their expansion into sound effects and now sophisticated music suggests a strategy to become a comprehensive creative suite. This puts them on a collision course with established digital audio workstations (DAWs) and older plugin manufacturers. If a user can perform complex arrangements and genre shifts within a simple browser interface, the barrier to entry for high-quality music production will continue to plummet, potentially democratizing professional-sounding output at the expense of traditional session musicianship.

Looking ahead, the industry will be watching how artists integrate these tools into their existing workflows. The "North Star" for AI music remains high-fidelity multitracking—the ability to separate stems (drums, bass, vocals) for independent manipulation. While ElevenLabs has made a massive leap in regional regeneration, the ultimate goal is a system where the AI acts as a collaborative member of a band rather than a locked-in generator. As these models become more responsive to prescriptive human editing, the line between "AI-generated" and "human-produced" will continue to blur into a new category of hybrid media.

Why it matters

  • 01ElevenLabs' new model introduces 'audio inpainting,' allowing creators to edit specific sections of a song without regenerating the entire track.
  • 02The ability to switch genres mid-track transforms AI music tools from mere toys into viable assets for dynamic scoring in gaming and film.
  • 03This shift toward iterative control signals a move away from 'one-shot' generation, intensifying the competitive pressure on traditional professional audio software.
Read the full story at TechCrunch AI
Share