
Overview
Developed a dynamic caption placement engine for videos with YOLOv8. The engine uses YOLOv8 + saliency map to detect the objects in the video and then places the caption on empty spaces to avoid blocking the action.
Key Results
Dynamically calculates optimal subtitle placement across a 6-zone spatial grid to prevent the occlusion of faces and critical action.
Strict <15% Intersection over Union (IoU) veto threshold to create dynamic "cost heatmaps" across 1080p video frames.
Temporal aggregation algorithm achieved 80% reduction in processing time.
Conversion engine parses standard SRT files to ASS format for seamless integration into video players.
Tech Stack
PythonYOLOv8