Most AI voice detectors only support English. I'm a 2nd year ECE student at KIT, Tamil Nadu — so I built one that actually works for Indian languages. Here's exactly how I did it and what broke along the way.
The Problem
Deepfake audio is a real threat. Politicians, scammers, and bad actors are using AI-cloned voices to spread misinformation and fraud. The tools that exist to detect this? Almost all English-only.
Nobody was building for Tamil. Hindi. Malayalam. Telugu.
So I did.
What I Built
VoiceID — a free AI voice detector that analyses 88 acoustic features from any MP3 or WAV file and classifies the voice as Human or AI with a confidence score.
🔴 Live demo: https://mdeepaksai.github.io/human-or-AI/
Upload any MP3 or WAV file (or record live)
Select your language
Get a Human or AI verdict with confidence score
Free. No login. No signup.
Supports: English, Tamil, Hindi, Malayalam, Telugu.
Tech Stack
FastAPI backend deployed on Railway
Vanilla JS frontend hosted on GitHub Pages
Supabase for live visitor and analysis tracking
Librosa for acoustic feature extraction (88 features per audio file)
Scikit-learn for the classification model
How It Works
When you upload an audio file, the backend extracts 88 acoustic features including:
MFCCs (Mel-frequency cepstral coefficients)
Pitch variation and breathiness
Spectral entropy and rolloff
Zero crossing rate
These features differ significantly between real human speech and AI-synthesised voices. The model was trained on a dataset of both human and TTS-generated audio across all 5 supported languages.
The result is a confidence score — not just a binary yes/no — so you can see how certain the model is.
What Actually Broke (The Real Lessons)
- CORS almost killed the project My first deployment had allow_credentials=True combined with allow_origins=["*"] in FastAPI. That combination is invalid and browsers block it silently. Spent 3 hours debugging what looked like a network error before I found it. Fix: either use credentials with specific origins, or use wildcard without credentials.
- Audio format compatibility is a nightmare WAV files from different recording tools have different sample rates, bit depths, and channel configurations. Had to add preprocessing to normalise all incoming audio before feature extraction or the model would fail silently on some files.
- Railway cold starts Free tier Railway deployments sleep after inactivity. First request after sleep takes 10-15 seconds. Added a loading state to the UI so users don't think it's broken.
- File size vs accuracy tradeoff Larger files give more accurate results but hit Railway's memory limits. Settled on 10MB max with a recommendation to use at least 3 seconds of audio for reliable results.
Results So Far
133+ visitors since launch
32+ analyses run
Indexed on Google
Launched on Uneed, Fazier, Peerlist
Try It
Free. No login. No signup.
👉 https://mdeepaksai.github.io/human-or-AI/
GitHub: https://github.com/mdeepaksai/human-or-AI
If you test it, drop a comment — would love to know how it performs on your audio.
Built by Mallarpu Deepak Sai — 2nd year ECE @ KIT, Tamil Nadu. Building and shipping AI-powered web apps. Open to internships.
Top comments (0)