What if you could control your computer using just your voice โ without needing a powerful GPU or heavy local models?
I built a Voice-Controlled AI Agent that:
- Understands speech ๐ค
- Detects user intent ๐ง
- Executes real actions like file creation, code generation, and summarization โก
And the best part?
๐ It works smoothly even on low-end systems (8GB RAM).
๐ฌ Demo
๐ฝ๏ธ Watch the full demo here:
๐ https://drive.google.com/file/d/17Uvp72dDi82pAqEqbJ6pl3LaLphxwaGm/view?usp=sharing
(Replace with your YouTube / Drive / Loom video link)
https://youtu.be/Pl3lwBoYruM
LIVE LINK
https://localaiagent-twxulfwrigcagqtbecnomh.streamlit.app/
โจ Features
-
๐ค Audio Input
- Record directly from microphone
- Upload audio files
-
๐ง Intent Classification
- Converts speech โ structured JSON
- Accurately detects user commands
-
โก Core Actions
-
create_fileโ Creates files safely -
write_codeโ Generates and saves code -
summarize_textโ Summarizes content -
general_chatโ Handles normal queries
-
-
๐ Safe Execution
- All outputs are restricted to
/outputdirectory - Prevents accidental system modification
- All outputs are restricted to
๐๏ธ System Architecture
Building AI systems locally with limited RAM is challenging. Here's how I solved it:
1. ๐๏ธ Speech-to-Text (STT)
Local Mode:
Usesopenai-whisper(tiny model) โ runs on CPUFast Mode (Recommended):
Uses Groq API (Whisper-large-v3) โ extremely fast โก
2. ๐ง LLM + Intent Engine
Running large models locally was not feasible:
- 8B models consume ~5GB RAM โ
- Causes system slowdown
๐ Solution:
- Used Groq API (Llama 3 - 8B / 70B)
-
Provides:
- Fast inference โก
- Structured JSON output
- Reliable intent classification
3. ๐ฅ๏ธ Frontend
- Built using Streamlit
- Uses
st.audio_inputfor seamless recording - Simple and clean UI
๐ How It Works
- User speaks or uploads audio ๐ค
- Whisper converts speech โ text
- LLM processes text โ structured JSON
- System executes action locally
Example:
{
"action": "create_file",
"filename": "hello.py"
}
๐ป Example Use Case
๐ฃ๏ธ User says:
"Create a Python file called hello.py"
โ๏ธ System:
- Transcribes audio
- Detects
create_fileintent - Creates file in
/outputfolder - Shows success message
โก Setup Instructions
Prerequisites
- Python 3.10+
- Groq API Key โ https://console.groq.com
- FFmpeg installed
Installation
git clone <your-repo-link>
cd local_ai_agent
pip install -r requirements.txt
Environment Setup
GROQ_API_KEY=your_api_key_here
Run the App
streamlit run app.py
โ ๏ธ Challenges Faced
- Running LLMs on 8GB RAM
- Slow transcription using CPU Whisper
- Ensuring consistent JSON output from LLM
- Managing safe file execution
๐ก Key Learnings
- Hybrid approach (local + API) is powerful
- Structured prompts = better automation
- UI simplicity improves usability massively
๐ฎ Future Improvements
- Add more actions (email automation, system control)
- Improve offline performance
- Add memory (conversation history)
- Multi-command execution
๐ Links
- ๐ป GitHub: https://github.com/Akash7367/Local_AI_Agent
- ๐ Portfolio: https://portfolio-c2xg.vercel.app/?_vercel_share=v9vu4mbb0xIGMIHlCjfjGlcQPbiusSj5
- ๐ LinkedIn: https://www.linkedin.com/in/akash-kumar-298113264/
๐ Final Thoughts
This project shows that you donโt need expensive hardware to build powerful AI systems.
With the right architecture and smart trade-offs, even a mid-range laptop can run intelligent AI agents efficiently.
If you found this useful, feel free to โญ the repo or share your thoughts!
Top comments (0)