Akash kumar

Posted on Apr 13

Voice-Controlled Local AI Agent (Works Even on 8GB RAM)

#ai #machinelearning #opensource #agents

What if you could control your computer using just your voice — without needing a powerful GPU or heavy local models?

I built a Voice-Controlled AI Agent that:

Understands speech 🎤
Detects user intent 🧠
Executes real actions like file creation, code generation, and summarization ⚡

And the best part?
👉 It works smoothly even on low-end systems (8GB RAM).

🎬 Demo

📽️ Watch the full demo here:
👉 https://drive.google.com/file/d/17Uvp72dDi82pAqEqbJ6pl3LaLphxwaGm/view?usp=sharing

(Replace with your YouTube / Drive / Loom video link)

https://youtu.be/Pl3lwBoYruM

LIVE LINK

https://localaiagent-twxulfwrigcagqtbecnomh.streamlit.app/

✨ Features

🎤 Audio Input
- Record directly from microphone
- Upload audio files
🧠 Intent Classification
- Converts speech → structured JSON
- Accurately detects user commands
⚡ Core Actions
- create_file → Creates files safely
- write_code → Generates and saves code
- summarize_text → Summarizes content
- general_chat → Handles normal queries
🔒 Safe Execution
- All outputs are restricted to /output directory
- Prevents accidental system modification

🏗️ System Architecture

Building AI systems locally with limited RAM is challenging. Here's how I solved it:

1. 🎙️ Speech-to-Text (STT)

Local Mode:
Uses openai-whisper (tiny model) → runs on CPU
Fast Mode (Recommended):
Uses Groq API (Whisper-large-v3) → extremely fast ⚡

2. 🧠 LLM + Intent Engine

Running large models locally was not feasible:

8B models consume ~5GB RAM ❌
Causes system slowdown

👉 Solution:

Used Groq API (Llama 3 - 8B / 70B)
Provides:
- Fast inference ⚡
- Structured JSON output
- Reliable intent classification

3. 🖥️ Frontend

Built using Streamlit
Uses st.audio_input for seamless recording
Simple and clean UI

🔄 How It Works

User speaks or uploads audio 🎤
Whisper converts speech → text
LLM processes text → structured JSON
System executes action locally

Example:

{
  "action": "create_file",
  "filename": "hello.py"
}

💻 Example Use Case

🗣️ User says:

"Create a Python file called hello.py"

⚙️ System:

Transcribes audio
Detects create_file intent
Creates file in /output folder
Shows success message

⚡ Setup Instructions

Prerequisites

Python 3.10+
Groq API Key → https://console.groq.com
FFmpeg installed

Installation

git clone <your-repo-link>
cd local_ai_agent
pip install -r requirements.txt

Environment Setup

GROQ_API_KEY=your_api_key_here

Run the App

streamlit run app.py

⚠️ Challenges Faced

Running LLMs on 8GB RAM
Slow transcription using CPU Whisper
Ensuring consistent JSON output from LLM
Managing safe file execution

💡 Key Learnings

Hybrid approach (local + API) is powerful
Structured prompts = better automation
UI simplicity improves usability massively

🔮 Future Improvements

Add more actions (email automation, system control)
Improve offline performance
Add memory (conversation history)
Multi-command execution

🔗 Links

💻 GitHub: https://github.com/Akash7367/Local_AI_Agent
🌐 Portfolio: https://portfolio-c2xg.vercel.app/?_vercel_share=v9vu4mbb0xIGMIHlCjfjGlcQPbiusSj5
🔗 LinkedIn: https://www.linkedin.com/in/akash-kumar-298113264/

🙌 Final Thoughts

This project shows that you don’t need expensive hardware to build powerful AI systems.

With the right architecture and smart trade-offs, even a mid-range laptop can run intelligent AI agents efficiently.

If you found this useful, feel free to ⭐ the repo or share your thoughts!

🏷️ Tags

python #ai #machinelearning #streamlit #opensource #productivity

DEV Community