DEV Community

Lich Priest
Lich Priest

Posted on

Deploying a Real-Time Object Detection API with YOLOv8 and FastAPI

Introduction

Object detection is one of the most exciting use‑cases of computer vision, and the YOLO (You Only Look Once) family has become the go‑to solution for real‑time inference. In this tutorial you’ll learn how to:

  1. Train a custom YOLOv8 model on your own dataset.
  2. Wrap the model in a FastAPI service that accepts image uploads and returns detections instantly.
  3. Containerize the whole stack with Docker so it runs the same everywhere.
  4. Automate testing and deployment using a GitHub Actions CI/CD pipeline.

By the end you’ll have a production‑ready API that can be deployed to any container host (AWS ECS, GCP Cloud Run, Azure Container Apps, or even your laptop).

Tip: If you’re new to YOLOv8, the official Ultralytics repo ships with a very friendly CLI that handles most of the heavy lifting. We’ll use it as the foundation and then add a thin FastAPI wrapper around the exported model.


1. Preparing the data and training YOLOv8

1.1 Organize your dataset

YOLO expects the following directory layout:

dataset/
├── images/
│   ├── train/
│   └── val/
└── labels/
    ├── train/
    └── val/
Enter fullscreen mode Exit fullscreen mode
  • Images can be JPEG or PNG.
  • Labels are text files with the same base name as the image, each line containing class_id x_center y_center width height (all normalized to [0,1]).

If you have data in COCO or Pascal VOC format, the ultralytics package can convert it automatically:

pip install ultralytics
yolo convert data.yaml --format coco   # or --format voc
Enter fullscreen mode Exit fullscreen mode

1.2 Create a data.yaml file

train: ./dataset/images/train
val:   ./dataset/images/val

nc: 3                     # number of classes
names: ['person', 'bicycle', 'dog']
Enter fullscreen mode Exit fullscreen mode

1.3 Train the model

The simplest way is to use the CLI:

yolo task=detect mode=train data=./data.yaml epochs=50 imgsz=640 batch=16 model=yolov8n.pt
Enter fullscreen mode Exit fullscreen mode
  • yolov8n.pt is the nano version, perfect for low‑latency inference.
  • Adjust epochs, batch, and imgsz to fit your compute budget.

The training script will create a runs/detect/train/weights/best.pt file – this is the model we’ll serve.

1.4 Quick sanity check

yolo task=detect mode=val model=./runs/detect/train/weights/best.pt data=./data.yaml
Enter fullscreen mode Exit fullscreen mode

You should see a summary of mAP, precision, recall, and a few sample images with bounding boxes saved under runs/detect/val/predict.


2. Exporting the model for inference

YOLOv8 can export to several formats (torchscript, ONNX, TensorRT). For a FastAPI service running on CPU or GPU, the native PyTorch format works fine, but we’ll also export to ONNX for future flexibility.

yolo export model=./runs/detect/train/weights/best.pt format=onnx opset=12
Enter fullscreen mode Exit fullscreen mode

You’ll get best.onnx in the same folder. Keep both best.pt and best.onnx – the former is useful for quick local testing, the latter for edge deployments.


3. Building the FastAPI wrapper

Create a new folder called api/ and add the following files.

3.1 requirements.txt

fastapi==0.110.0
uvicorn[standard]==0.27.0
python-multipart==0.0.9
torch==2.2.0
opencv-python-headless==4.9.0.80
ultralytics==8.2.0
pydantic==2.6.1
Enter fullscreen mode Exit fullscreen mode

3.2 app.py

from fastapi import FastAPI, File, UploadFile, HTTPException
from fastapi.responses import JSONResponse
from pathlib import Path
import io
import cv2
import torch
import numpy as np

app = FastAPI(title="YOLOv8 Object Detection API")

# Load the model once at startup
MODEL_PATH = Path(__file__).parent / "best.pt"
if not MODEL_PATH.exists():
    raise FileNotFoundError(f"Model not found at {MODEL_PATH}")

model = torch.hub.load('ultralytics/yolov5', 'custom', path=str(MODEL_PATH), force_reload=True)
model.eval()

def read_image(file: UploadFile) -> np.ndarray:
    """Convert uploaded file to a OpenCV BGR image."""
    contents = file.file.read()
    np_arr = np.frombuffer(contents, np.uint8)
    img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
    if img is None:
        raise HTTPException(status_code=400, detail="Invalid image")
    return img

@app.post("/detect")
async def detect(file: UploadFile = File(...)):
    """Accept an image and return YOLO detections."""
    img = read_image(file)
    results = model(img)               # Inference
    detections = results.pandas().xyxy[0]  # Pandas DataFrame

    # Convert to JSON‑serializable dict
    output = detections.to_dict(orient="records")
    return JSONResponse(content=output)
Enter fullscreen mode Exit fullscreen mode

Explanation of key parts

  • Model loading: We use torch.hub.load with the custom flag to load our best.pt. This runs on CPU by default; add device='cuda' if you have a GPU.
  • Image handling: UploadFile gives us a file‑like object. We decode it with OpenCV so the model receives a NumPy array.
  • Result formatting: YOLO returns a Results object. The pandas().xyxy[0] view gives us a tidy DataFrame with columns xmin, ymin, xmax, ymax, confidence, class, name. Converting it to a list of dicts makes the API response clean.

3.3 Run locally

pip install -r api/requirements.txt
uvicorn api.app:app --host 0.0.0.0 --port 8000
Enter fullscreen mode Exit fullscreen mode

Visit http://localhost:8000/docs – FastAPI automatically generates an interactive Swagger UI. Try uploading a picture and you should receive a JSON array of detections.


4. Dockerizing the service

Create a Dockerfile at the project root:

# Use the official lightweight Python image
FROM python:3.11-slim

# Install system dependencies (opencv needs libgl1)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libgl1 && \
    rm -rf /var/lib/apt/lists/*

# Set working directory
WORKDIR /app

# Copy only requirements first for layer caching
COPY api/requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy the rest of the code and the trained model
COPY api/ ./api/
COPY runs/detect/train/weights/best.pt ./api/

# Expose FastAPI port
EXPOSE 8000

# Command to run the service
CMD ["uvicorn", "api.app:app", "--host", "0.0.0.0", "--port", "8000"]
Enter fullscreen mode Exit fullscreen mode

4.1 Build and test the image

docker build -t yolov8-api:latest .
docker run -p 8000:8000 yolov8-api:latest
Enter fullscreen mode Exit fullscreen mode

Again, navigate to http://localhost:8000/docs to verify the container works.


5. CI/CD with GitHub Actions

Having the Docker image build automatically on every push guarantees reproducibility. Add the following workflow file at .github/workflows/docker-ci.yml:

name: CI / CD for YOLOv8 FastAPI

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  build-and-push:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Set up QEMU (for multi‑arch builds)
        uses: docker/setup-qemu-action@v3

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Log in to Docker Hub
        uses: docker/login-action@v3
        with:
          username: ${{ secrets.DOCKERHUB_USERNAME }}
          password: ${{ secrets.DOCKERHUB_TOKEN }}

      - name: Build and push Docker image
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:latest
          cache-from: type=registry,ref=${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:cache
          cache-to: type=registry,ref=${{ secrets.DOCKERHUB_USERNAME }}/yolov8-api:cache,mode=max
Enter fullscreen mode Exit fullscreen mode

What this does

  1. Checks out the code on the GitHub runner.
  2. Enables multi‑architecture builds (useful if you later target ARM devices).
  3. Authenticates with Docker Hub using encrypted repository secrets.
  4. Builds the image and pushes it to Docker Hub under your namespace.
  5. Caches layers to speed up subsequent builds.

You can now trigger a deployment on any platform that can pull from Docker Hub (e.g., a simple docker run on an EC2 instance or a Kubernetes pod).


6. Optional: Deploying to a cloud provider

6.1 AWS Elastic Container Service (ECS) – Fargate

# 1. Create a cluster
aws ecs create-cluster --cluster-name yolov8-cluster

# 2. Register task definition (task-def.json)
aws ecs register-task-definition --cli-input-json file://task-def.json

# 3. Run service
aws ecs create-service \
  --cluster yolov8-cluster \
  --service-name yolov8-service \
  --task-definition yolov8-task \
  --desired-count 1 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={subnets=[subnet-xxxx],securityGroups=[sg-xxxx],assignPublicIp=ENABLED}"
Enter fullscreen mode Exit fullscreen mode

The task-def.json would reference the image you pushed (youruser/yolov8-api:latest) and expose port 8000. After a few minutes the service is reachable via the load balancer URL.

6.2 Google Cloud Run

gcloud run deploy yolov8-api \
  --image=gcr.io/<PROJECT_ID>/yolov8-api:latest \
  --platform=managed \
  --region=us-central1 \
  --allow-unauthenticated \
  --port=8000
Enter fullscreen mode Exit fullscreen mode

Both platforms automatically handle scaling, health checks, and HTTPS termination, leaving you with a low‑maintenance API.


7. Testing the live endpoint

You can use curl or a small Python script:

import requests

url = "http://<host>:8000/detect"
files = {"file": open("test.jpg", "rb")}
resp = requests.post(url, files=files)
print(resp.json())
Enter fullscreen mode Exit fullscreen mode

The response will be a list of detections, each containing:

{
  "xmin": 124,
  "ymin": 87,
  "xmax": 342,
  "ymax": 276,
  "confidence": 0.93,
  "class": 0,
  "name": "person"
}
Enter fullscreen mode Exit fullscreen mode

You can now feed this output into downstream services—tracking, alerting, or even a front‑end UI that draws boxes in real time.


Key takeaways

  • Train a custom

Top comments (0)