Bhasha SLMs are the future of Indian AI

In the global race for AI supremacy, India occupies a uniquely diverse and challenging position. While countries like the U.S. and China build monolithic AI ecosystems dominated by English or Mandarin, India must reckon with an intricate linguistic tapestry: 22 constitutionally recognised languages, hundreds of dialects, and deeply embedded regional identities. To build an inclusive AI future for 1.4 billion people, India must shift its focus from centralised, one-size-fits-all models to a regionalised, decentralised approach powered by Small Language Models (SLMs) and commodity hardware.

India’s multilingual character is both a blessing and a challenge. English remains the de facto language of elite digital services, but only about 10% of Indians are fluent in it. The remaining 90%—largely excluded from AI-driven innovations—speak and consume content in Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, and other regional languages. This divide creates a growing “AI accessibility gap,” where only urban, English-speaking Indians benefit from cutting-edge tools like chatbots, recommendation engines, or educational AI tutors. If India is to democratise AI, it must build systems that speak to Indians in their own languages—not just translate English models poorly. This is where regionalisation of AI becomes essential. The goal is not to merely “translate” English AI outputs but to train and fine-tune models natively in Indian languages, incorporating local syntax, semantics, idioms, and context.

The future of regional AI in India will be led not by massive, resource-hungry large language models (LLMs) like GPT-4 or PaLM, but by Small Language Models (SLMs)—lean, efficient AI models that can be trained and deployed on modest datasets and hardware.

Indian languages suffer from a lack of large-scale, high-quality datasets. SLMs can perform remarkably well with targeted, curated content—news archives, government documents, film subtitles, social media posts, and radio transcripts—in local languages. Most Indian startups, educational institutions, and local governments do not have access to AI supercomputers. SLMs, which can be trained and run on commodity hardware, make AI development more inclusive and scalable. From rural schools to agri-tech kiosks, SLMs can be embedded in low-cost edge devices, providing offline AI capabilities where internet connectivity is poor or absent.

Imagine a farmer in rural Odisha asking a voice-based chatbot in Odia about pest control measures, or a student in deep interior Maharashtra learning from a Marathi AI tutor. These are not pipe dreams—they are achievable today with the right investment in SLMs and regional data pipelines.

The AI boom has largely been a hardware-intensive revolution. Training GPT-3 required thousands of GPUs and billions of dollars. India cannot afford this model of development—nor should it try to replicate it. Instead, India can leverage commodity hardware—affordable GPUs, commercially available CPUs, ARM processors, and local server farms—to power regional AI. By optimising SLMs for low-resource environments, developers can enable AI applications to run on mobile phones, Raspberry Pi kits, or local data centres. This low-cost, high-impact model has already been proven in other sectors—like the success of Unified Payments Interface (UPI) in democratising digital payments without requiring cutting-edge smartphones or internet speeds. A similar playbook can work for AI.

For regional AI to flourish, India needs a cohesive national strategy that goes beyond flashy tech announcements. The focus must be on creating Open Datasets– the Union Government must invest in building open, multilingual datasets in partnership with state governments, academia, civil society, and startups. This should include voice, text, and video data across all 22 official languages. In addition, analogous to building Digital Public Infrastructure like Aadhaar and UPI, we need AI public stacks—localised models, APIs, benchmarks, and training environments—that can be used by developers nationwide. A promising start has been made through initiatives like AI4Bharat, Bhashini and AIKosh. We also need policies that incentivise ethical, explainable, and locally relevant AI applications—especially in sectors like agriculture, education, health, and governance. The IndiaAI mission is already endeavouring to take the necessary steps. Finally, we need regional AI hubs in preferably non-metro cities which can train local youth, host computing resources, and create models tailored for specific linguistic and cultural contexts.

Beyond accessibility and economics, regionalisation of AI has a profound cultural role to play. By embedding local languages into modern AI tools, we can preserve and promote India’s linguistic heritage, ensuring that Bhojpuri, Konkani, or Manipuri and other so-called dialects are not left behind in the digital age.

To summarise: India stands at a crossroads. It can choose to be a passive consumer of AI innovations built elsewhere, or it can build an AI ecosystem rooted in its diversity, frugality, and ingenuity. The future lies not in mimicking Silicon Valley but in embracing the Bharat-first approach: regionalised AI, powered by small language models and built on the bedrock of accessible hardware. In doing so, we can ensure that the fruits of AI percolate to all Indians, in any tongue they speak.

Bhasha SLMs are the future of Indian AI

Contents

Authors

Hemant Adarkar

Sridhar Ganapathy

Related Articles

Preparing India for AI Adoption: Challenges and Solutions

Opinion | Better data governance can help states improve welfare delivery

Inside the UPI decade: How a cash-first nation learned to pay differently

An Analysis of DPIs Value-Generating Ability