zkaria gamal

Posted on Feb 16

Meet StudyWithMiku 🎤📚 – Your AI Anime Study Buddy That Actually Speaks & Animates!

#ai #productivity #langchain #opensource

# Using AI to Build a Study Buddy That Feels Like Hatsune Miku 🎤✨

ai #python #langchain #tts #opensource #vocaloid #rag #localllm

Studying alone? Boring.

Studying with an AI that reads your PDFs, explains concepts, remembers context, and talks like Miku with an anime-style voice?

Way better.

I used AI tools (Copilot, Claude, Gemini, local Ollama) to ship StudyWithMiku — an autonomous AI study companion that:

📚 Reads and embeds your PDFs
🧠 Answers using RAG + memory
🎀 Responds with Miku’s personality
🎙 Speaks with a character-style voice (not a robotic TTS)

Repo: https://github.com/zkzkGamal/StudyWithFriend
Demo (voice + behavior): https://github.com/zkzkGamal/StudyWithFriend/blob/main/demo.mp4

🚀 What’s Working Really Well

🎀 1. Personality That Actually Feels Alive

Miku’s personality is fully implemented using:

Prompt engineering (prompt.yaml)
LangGraph state + memory
Structured tool calling

She responds with cute, energetic vibes:

♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo cool for studying! ^_^

She remembers context across questions. It feels less like “query → response” and more like chatting with a nerdy Vocaloid friend.

🎙 2. Custom Voice (Not Generic TTS)

Voice pipeline:

Coqui TTS (acoustic model)
DiffSinger vocoder
sounddevice playback

This gives anime-style character speech instead of flat robotic output.

It’s not fully expressive idol-concert mode yet — but it’s already very distinct.

📚 3. Real RAG, Not Just Chat

Drop a PDF into content/ → auto-embedded into ChromaDB in the background.

You get:

Smart retrieval
Context-aware answers
Tool usage (web search, open browser, system commands)
Error handling

It’s a proper agent — not just a wrapper over an LLM.

🧪 What’s Still Basic (Honest Section)

TTS is clear but not ultra-expressive yet (emotion/prosody tuning next).
Animations work (sparkles, terminal flair), but they could evolve into:
- Sprite sequences
- Mini GUI
- Browser-based visuals
Voice emotion control needs better parameter tuning in DiffSinger.

The foundation is strong:
Agent ✔
Memory ✔
Voice ✔
RAG ✔

Now it’s polish time.

💡 Why I Built This

I love Vocaloid. Studying is hard. Motivation matters.

So I asked myself:

Why not turn studying into hanging out with Miku?

Cheerful voice + personality + visual feedback = more engagement.

And honestly? It works.

⚡ How AI Helped Me Ship Fast

AI wasn’t just autocomplete — it was a multiplier.

It helped me:

Scaffold the LangGraph agent structure
Fix PyTorch + protobuf dependency chaos
Generate 90% of the Bash installer (venv, CUDA, model downloads)
Iterate on Miku’s personality in minutes
Debug Chroma, audio pipelines, tool execution

But here’s the key:

AI gave speed.
Understanding the TTS pipeline, agent state transitions, and RAG design gave growth.

That’s where the real learning happened.

🛠 Quick Start

git clone https://github.com/zkzkGamal/StudyWithFriend.git
cd StudyWithMiku
chmod +x install.sh
./install.sh

Edit .env (choose Ollama/local or cloud LLM), then:

source venv/bin/activate
python main.py

Drop PDFs into content/ and start chatting.

🎯 Example Interaction

You:
Explain Bayes theorem from my stats notes.

Miku:
♪ Bayes time~! ★ P(A|B) = P(B|A) * P(A) / P(B) … Miku thinks this is sooo powerful for updating beliefs! ^_^

(Voice playback + animation trigger happens here)

🔮 Next Steps

Emotion-aware TTS (tag-based prosody control?)
Better DiffSinger tuning
Real animated sprites
Character toggle (Teto mode?)
Flashcards & quiz generation
Study session gamification

🧠 Who This Is For

If you’re into:

Local AI agents
RAG systems
TTS pipelines
Anime/Vocaloid
Building weird but fun AI tools

Clone it. Break it. Improve it.

I’d love feedback on:

How the personality feels
Voice quality on your machine
Ideas to make her more “idol-tier”

PRs and issues are very welcome.

Built with ❤️ in Cairo by Zkzk (zkzkGamal on GitHub).

DEV Community