# Using AI to Build a Study Buddy That Feels Like Hatsune Miku 🎤✨
ai #python #langchain #tts #opensource #vocaloid #rag #localllm
Studying alone? Boring.
Studying with an AI that reads your PDFs, explains concepts, remembers context, and talks like Miku with an anime-style voice?
Way better.
I used AI tools (Copilot, Claude, Gemini, local Ollama) to ship StudyWithMiku — an autonomous AI study companion that:
- 📚 Reads and embeds your PDFs
- 🧠 Answers using RAG + memory
- 🎀 Responds with Miku’s personality
- 🎙 Speaks with a character-style voice (not a robotic TTS)
Repo: https://github.com/zkzkGamal/StudyWithFriend
Demo (voice + behavior): https://github.com/zkzkGamal/StudyWithFriend/blob/main/demo.mp4
🚀 What’s Working Really Well
🎀 1. Personality That Actually Feels Alive
Miku’s personality is fully implemented using:
- Prompt engineering (
prompt.yaml) - LangGraph state + memory
- Structured tool calling
She responds with cute, energetic vibes:
♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo cool for studying! ^_^
She remembers context across questions. It feels less like “query → response” and more like chatting with a nerdy Vocaloid friend.
🎙 2. Custom Voice (Not Generic TTS)
Voice pipeline:
- Coqui TTS (acoustic model)
- DiffSinger vocoder
- sounddevice playback
This gives anime-style character speech instead of flat robotic output.
It’s not fully expressive idol-concert mode yet — but it’s already very distinct.
📚 3. Real RAG, Not Just Chat
Drop a PDF into content/ → auto-embedded into ChromaDB in the background.
You get:
- Smart retrieval
- Context-aware answers
- Tool usage (web search, open browser, system commands)
- Error handling
It’s a proper agent — not just a wrapper over an LLM.
🧪 What’s Still Basic (Honest Section)
- TTS is clear but not ultra-expressive yet (emotion/prosody tuning next).
-
Animations work (sparkles, terminal flair), but they could evolve into:
- Sprite sequences
- Mini GUI
- Browser-based visuals
Voice emotion control needs better parameter tuning in DiffSinger.
The foundation is strong:
Agent ✔
Memory ✔
Voice ✔
RAG ✔
Now it’s polish time.
💡 Why I Built This
I love Vocaloid. Studying is hard. Motivation matters.
So I asked myself:
Why not turn studying into hanging out with Miku?
Cheerful voice + personality + visual feedback = more engagement.
And honestly? It works.
⚡ How AI Helped Me Ship Fast
AI wasn’t just autocomplete — it was a multiplier.
It helped me:
- Scaffold the LangGraph agent structure
- Fix PyTorch + protobuf dependency chaos
- Generate 90% of the Bash installer (venv, CUDA, model downloads)
- Iterate on Miku’s personality in minutes
- Debug Chroma, audio pipelines, tool execution
But here’s the key:
AI gave speed.
Understanding the TTS pipeline, agent state transitions, and RAG design gave growth.
That’s where the real learning happened.
🛠 Quick Start
git clone https://github.com/zkzkGamal/StudyWithFriend.git
cd StudyWithMiku
chmod +x install.sh
./install.sh
Edit .env (choose Ollama/local or cloud LLM), then:
source venv/bin/activate
python main.py
Drop PDFs into content/ and start chatting.
🎯 Example Interaction
You:
Explain Bayes theorem from my stats notes.
Miku:
♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo powerful for updating beliefs! ^_^
(Voice playback + animation trigger happens here)
🔮 Next Steps
- Emotion-aware TTS (tag-based prosody control?)
- Better DiffSinger tuning
- Real animated sprites
- Character toggle (Teto mode?)
- Flashcards & quiz generation
- Study session gamification
🧠 Who This Is For
If you’re into:
- Local AI agents
- RAG systems
- TTS pipelines
- Anime/Vocaloid
- Building weird but fun AI tools
Clone it. Break it. Improve it.
I’d love feedback on:
- How the personality feels
- Voice quality on your machine
- Ideas to make her more “idol-tier”
PRs and issues are very welcome.
Built with ❤️ in Cairo by Zkzk (zkzkGamal on GitHub).
Top comments (0)