OpinionPulse AI·

My AI Is Gentrifying My Mother Tongue

I asked an AI for a poem in my native Marathi, and its sterile output revealed a chilling truth: AI is erasing the cultural texture of regional languages.

By Rohan Mehta·5 min read
Share
My AI Is Gentrifying My Mother Tongue
AI-Assisted Editorial

This opinion piece was drafted with AI assistance under the editorial direction of Rohan Mehta and reviewed before publication. Views expressed are the author's own.

I had what I thought was a simple, almost romantic, request. On a quiet Sunday afternoon, missing the sounds of my childhood home in Pune, I asked one of the world's most advanced AI models to write me a short poem in Marathi. The topic was simple, too: the first rain in Mumbai. I prompted it with a few keywords in my mother tongue: ‘paus’ (rain), ‘maati’ (earth), ‘Mumbai’, ‘samudra’ (sea). I sat back, genuinely excited. Here was a tool that promised to democratize creativity, to connect worlds. I envisioned it conjuring the frenetic, chaotic joy of the city finally finding relief from the oppressive summer heat.

What I got back was… correct. It was grammatically perfect. The verses rhymed. It used the words I had provided. It spoke of clouds gathering, drops falling, and the earth smelling sweet. A non-Marathi speaker, running it through a translation tool, would have called it a success. But for me, reading those lines felt like eating a beautifully decorated but utterly flavorless piece of fruit. It was a sterile, corporate-approved, textbook version of a Marathi poem. It was the linguistic equivalent of a stock photograph.

Where was the grit? Where was the ‘chikhhal’, that uniquely sticky, sludgy mud that cakes your ankles and defines the monsoon commute? Where was the specific relief of a ‘garam chai’ (hot tea) and ‘kanda bhaji’ (onion fritters) at a roadside stall, the sizzle of the oil fighting the downpour? The poem mentioned the smell of wet earth, but it lacked the soul of ‘mridgandha’, a word that carries with it a whole universe of memory and emotion. The AI’s poem was in Marathi, but it wasn't *of* Marathi. It spoke the language, but it didn't understand its world.

This isn't just a technical glitch or a quaint complaint about a robot's lack of poetic soul. This experience rattled me because it offered a glimpse into a much larger, more insidious process. I call it linguistic gentrification. In cities, gentrification happens when new money flows in, polishing away the old, gritty character of a neighborhood, replacing local shops with soulless chains, and ultimately pricing out the original inhabitants. A similar process is now unfolding in the digital world, and these large language models are the developers.

At the heart of the issue is data. AI models like the one I used are not sentient beings. They are unimaginably complex pattern-recognition machines, trained on vast troves of text and code scraped from the internet. The problem is that the internet, for the most part, speaks English. A disproportionate amount of the data used to train these foundational models is in English, reflecting a specific, predominantly Western, and often American cultural context. Other languages, especially those from the Global South, are statistically underrepresented.

When I ask an AI to write in Marathi, it isn't drawing from a deep, innate understanding of Marathi culture, literature, and conversation. It is, in effect, thinking in its primary language—English—and then translating those concepts and structures into Marathi using the limited, and often formal, data it has. The result is a language that has been stripped of its idiosyncrasies. The slang that colors our daily conversations, the proverbs our grandparents used that have no direct translation, the unique cadence and humor that make a language feel like home—all of this is treated as noise, as deviation from the norm. It gets sanded down, standardized, and sterilized.

In a country like India, the implications are profound. We live and breathe linguistic diversity. The 2011 census listed over 19,500 mother tongues. We switch languages mid-sentence without a second thought. My conversations with friends in Mumbai are a fluid mix of Marathi, Hindi, and English—a vibrant tapestry of 'Hinglish' or 'Manglish'. This code-switching isn't a sign of linguistic impurity; it's a dynamic, living feature of our modern identity. But for an AI trained on formal, monolithic datasets, this is an anomaly. It tries to ‘correct’ it, to force our messy, beautiful reality into its neat, orderly boxes. In doing so, it isn't just failing to represent us; it's subtly telling us that the way we naturally speak is wrong.

Think about the untranslatable. In Marathi, the expression 'डोक्याला ताप' (dokyala taap) literally means 'heat to the head', but it perfectly encapsulates a specific kind of mental friction and annoyance. An AI might translate it blandly as ‘a headache’ or ‘a hassle’, losing the visceral feeling embedded in the idiom. Or consider the affectionate but exasperated term 'veda', which can mean 'crazy' but carries a universe of context depending on tone and relationship. The AI can’t grasp this nuance. It sees the word, looks for its most common English equivalent, and flattens it. Every time this happens, a small piece of our cultural DNA is lost in translation.

This isn't a problem unique to India. I’ve spoken to colleagues from Kenya who worry about what AI means for Sheng, the urban slang of Nairobi that is perpetually evolving. Friends in Ireland lament how AI-generated text in Gaelic often feels stiff and academic, divorced from the living, breathing language in the Gaeltacht. We are witnessing the dawn of a potential digital monoculture, architected in Silicon Valley and exported globally. As we integrate these tools more deeply into our lives—for writing emails, for searching for information, for educating our children—we risk creating a feedback loop where the sanitized, gentrified version of our language becomes the new standard. The rich, diverse dialects and sociolects that exist offline could be gradually eroded by a globally uniform digital dialect.

But I am not a Luddite. As an editor at an AI publication, I see the immense potential of this technology. The answer isn't to reject it, but to reclaim it. The fight for our languages must now extend to the digital frontier. This means demanding and building models that are not just translated, but are trained from the ground up on diverse, local, and culturally rich data. It requires community-led efforts to collect and digitize our oral histories, our literature, our slang, and even our casual WhatsApp conversations. We need projects that prioritize creating datasets in Konkani, Santhali, Odia, and Kashmiri with the same seriousness afforded to English.

It’s a long road. It requires investment, expertise, and a fundamental shift in how Big Tech approaches language. They need to see linguistic diversity not as a challenge to be overcome with a universal translator, but as a rich tapestry to be preserved and nurtured. They need to empower local communities to be the custodians of their own digital linguistic future.

That Sunday afternoon, after my disappointing encounter with the AI poet, I closed my laptop. I called my Aai, my mother. I asked her how she would describe the first rain. She didn't talk in rhyming couplets. She spoke of the smell of hot oil as she rushed to make pakoras, the mad scramble to pull the laundry in from the balcony, the sound of the neighbourhood kids shouting with joy in the street. Her words weren't grammatically perfect, and they were peppered with Hindi. But they were alive. They had texture, history, and a soul. That is what’s at stake. Our tools should reflect that messy, beautiful reality, not erase it.

Why it matters

  • 01AI models trained on English-centric data are flattening regional languages, stripping them of cultural nuance, slang, and idioms.
  • 02This 'linguistic gentrification' threatens to create a digital monoculture, devaluing the diverse ways people actually speak in countries like India.
  • 03Preserving linguistic identity in the AI age requires building new models trained on diverse, community-sourced data from the ground up.
Read the full story at Pulse AI
Share