Yolo object detection project by Michael Wen
I wrote an app to detect objects in a photo
here using Yolo, all by myself. Here's the web UI:
Here's the system architecture:
About This Project
This application is a full-stack computer vision system built using
modern web and machine learning technologies. The frontend is
developed with
Vite + React for fast development and
minimal bundle size, while the backend is powered by
Python, FastAPI, YOLO, and Uvicorn, all running inside
Docker for maximum portability and compatibility.
Object detection is performed using the YOLO (You Only Look Once)
family of models, allowing the system to identify and localize
multiple objects such as people, vehicles, animals, and everyday
street scenes in a single pass. Users can dynamically switch between
different YOLO model sizes to balance speed and accuracy, and
optionally toggle confidence scores in the detection output.
To provide a rich and varied set of test images without storing large
datasets locally, the application integrates with the
Unsplash API, dynamically fetching public images
based on selected categories such as people, streets, markets, and
traffic. This approach keeps the application lightweight while still
enabling realistic, real-world object detection scenarios.
Disk space optimization was a major design consideration throughout
this project. Instead of running multiple containers, both this YOLO
object detection backend and a separate
digit-and-letter classification service (CNN-based)
were consolidated into a single Docker container. This optimization
reduced total disk usage on the VPS from over
30GB to
approximately
4GB.
Additional disk savings were achieved by developing locally and only
deploying the final production build to the server. The frontend is
built ahead of time and only the compiled
dist/ directory
is copied to the VPS, avoiding unnecessary development dependencies
in production.
The Docker image itself follows a
multi-stage Builder / Runtime pattern, ensuring that
only the minimal runtime dependencies are included in the final
image. During deployment, memory constraints on the VPS revealed an
out-of-memory (OOM) issue when loading multiple YOLO models
simultaneously. This was resolved by implementing
lazy loading of YOLO models, ensuring that each model
is only loaded into memory when it is actually needed.
Together, these design choices result in a system that is fast,
memory-efficient, disk-efficient, and production-ready, while still
delivering powerful real-time object detection capabilities through a
clean and user-friendly web interface.
Any comments? Feel free to participate below in the Facebook comment section.