SAM2-LongVideo: a robust tracking system based on SAM2 and YOLO

Authors: Saijun Wang
Conference: ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages: 2703-2712
Keywords: Video Object Segmentation, SAM2, YOLOv11, Long Video Segmentation, Memory Optimization.

Abstract

Video Object Segmentation VOS aims to track and segment objects across video sequences at the pixel level. SAM2, a state-of-the-art segmentation model, excels in zero-shot image and video segmentation but faces challenges in long video processing, including object tracking loss and high memory usage. This paper proposes a solution by combining SAM2 with YOLOv11. YOLOv11 provides real-time bounding box prompts for efficient segmentation, while SAM2 segments the target across sub-sequences. This approach reduces memory requirements and prevents object tracking loss by dividing the video into smaller sub-sequences without laborious human annotation. Experimental results show that the method maintains high segmentation accuracy with minimal memory consumption, making it suitable for long video segmentation in resource-constrained environments. This method offers an efficient and scalable solution for VOS tasks in various applications. The codes and GUI are available at https: github.com Saijun-Wang SAM2-LongVideo.
📄 View Full Paper (PDF) 📋 Show Citation