SAM2-LongVideo: a robust tracking system based on SAM2 and YOLO
Authors:
Saijun Wang
Conference:
ICIC 2025 Posters, Ningbo, China, July 26-29, 2025
Pages:
2703-2712
Keywords:
Video Object Segmentation, SAM2, YOLOv11, Long Video Segmentation, Memory Optimization.
Abstract
Video Object Segmentation VOS aims to track and segment objects across video sequences at the pixel level. SAM2, a state-of-the-art segmentation model, excels in zero-shot image and video segmentation but faces challenges in long video processing, including object tracking loss and high memory usage. This paper proposes a solution by combining SAM2 with YOLOv11. YOLOv11 provides real-time bounding box prompts for efficient segmentation, while SAM2 segments the target across sub-sequences. This approach reduces memory requirements and prevents object tracking loss by dividing the video into smaller sub-sequences without laborious human annotation. Experimental results show that the method maintains high segmentation accuracy with minimal memory consumption, making it suitable for long video segmentation in resource-constrained environments. This method offers an efficient and scalable solution for VOS tasks in various applications. The codes and GUI are available at https: github.com Saijun-Wang SAM2-LongVideo.
BibTeX Citation:
@inproceedings{ICIC2025,
author = {Saijun Wang},
title = {SAM2-LongVideo: a robust tracking system based on SAM2 and YOLO},
booktitle = {Proceedings of the 21st International Conference on Intelligent Computing (ICIC 2025)},
month = {July},
date = {26-29},
year = {2025},
address = {Ningbo, China},
pages = {2703-2712},
doi = {
10.65286/icic.v21i3.62852}
}