Method

A Weakly Supervised Framework with Spatio-Temporal Collaboration [STC-Seg]


Submitted on 20 Oct. 2022 04:55 by
Ricky Y. (Shandong First Medical University)

Running time:0.25 s
Environment:1 core @ 2.5 Ghz (C/C++)

Method Description:
"Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration":
First, we leverage the complementary representations from unsupervised depth estimation and optical flow to produce effective pseudo-labels for training deep networks and predicting high-quality instance masks. Second, to enhance the mask generation, we devise a puzzle loss, which enables end-to-end training using box-level annotations. Third, our tracking module jointly utilizes bounding-box diagonal points with spatio-temporal discrepancy to model movements, which largely improves the robustness to different object appearances. Finally, our framework is flexible and enables image-level instance segmentation methods to operate the video-level task.
Parameters:
(Parameters for Best HOTA on "Car")
Latex Bibtex:
@article{STC-Seg,
title={Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration},
author={Liqi, Yan and Qifan, Wang and Siqi, Ma and Jingang, Wang and Changbin Yu},
journal={IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)},
year={2022},
}

Detailed Results

From all 29 test sequences, our benchmark computes the commonly used tracking metrics (adapted for the segmentation case): CLEARMOT, MT/PT/ML, identity switches, and fragmentations [1,2]. The tables below show all of these metrics.


Benchmark sMOTSA MOTSA MOTSP MODSA MODSP
CAR 66.20 % 81.10 % 82.80 % 82.90 % 86.30 %
PEDESTRIAN 42.60 % 57.70 % 75.60 % 59.60 % 92.60 %

Benchmark recall precision F1 TP FP FN FAR #objects #trajectories
CAR 86.50 % 96.00 % 91.00 % 31790 1314 4970 11.80 % 39736 1400
PEDESTRIAN 61.80 % 96.50 % 75.40 % 12799 458 7898 4.10 % 13988 424

Benchmark MT PT ML IDS FRAG
CAR 71.90 % 26.10 % 2.00 % 676 1093
PEDESTRIAN 27.80 % 53.70 % 18.50 % 408 780

This table as LaTeX


[1] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[2] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker