Method

A Weakly Supervised Framework with Spatio-Temporal Collaboration [STC-Seg]


Submitted on 20 Oct. 2022 04:55 by
Ricky Y. (Shandong First Medical University)

Running time:0.25 s
Environment:1 core @ 2.5 Ghz (C/C++)

Method Description:
"Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration":
First, we leverage the complementary representations from unsupervised depth estimation and optical flow to produce effective pseudo-labels for training deep networks and predicting high-quality instance masks. Second, to enhance the mask generation, we devise a puzzle loss, which enables end-to-end training using box-level annotations. Third, our tracking module jointly utilizes bounding-box diagonal points with spatio-temporal discrepancy to model movements, which largely improves the robustness to different object appearances. Finally, our framework is flexible and enables image-level instance segmentation methods to operate the video-level task.
Parameters:
(Parameters for Best HOTA on "Car")
Latex Bibtex:
@article{STC-Seg,
title={Solve the Puzzle of Instance Segmentation in Videos: A Weakly Supervised Framework with Spatio-Temporal Collaboration},
author={Liqi, Yan and Qifan, Wang and Siqi, Ma and Jingang, Wang and Changbin Yu},
journal={IEEE Transactions on Circuits and Systems for Video Technology (TCSVT)},
year={2022},
}

Detailed Results

From all 29 test sequences, our benchmark computes the HOTA tracking metrics (HOTA, DetA, AssA, DetRe, DetPr, AssRe, AssPr, LocA) [1] as well as the CLEARMOT, MT/PT/ML, identity switches, and fragmentation [2,3] metrics. The tables below show all of these metrics.


Benchmark HOTA DetA AssA DetRe DetPr AssRe AssPr LocA
CAR 62.81 % 68.67 % 58.32 % 73.67 % 81.81 % 62.22 % 83.53 % 84.93 %
PEDESTRIAN 43.89 % 45.93 % 43.65 % 48.14 % 75.16 % 50.50 % 67.24 % 79.06 %

Benchmark TP FP FN
CAR 31792 4968 1312
PEDESTRIAN 12800 7897 457

Benchmark MOTSA MOTSP MODSA IDSW sMOTSA
CAR 81.08 % 82.83 % 82.92 % 676 66.22 %
PEDESTRIAN 57.66 % 75.59 % 59.64 % 408 42.57 %

Benchmark MT rate PT rate ML rate FRAG
CAR 71.92 % 26.13 % 1.95 % 889
PEDESTRIAN 27.78 % 53.70 % 18.52 % 807

Benchmark # Dets # Tracks
CAR 33104 1090
PEDESTRIAN 13257 314

This table as LaTeX


This figure as: png pdf

This figure as: png pdf

[1] J. Luiten, A. Os̆ep, P. Dendorfer, P. Torr, A. Geiger, L. Leal-Taixé, B. Leibe: HOTA: A Higher Order Metric for Evaluating Multi-object Tracking. IJCV 2020.
[2] K. Bernardin, R. Stiefelhagen: Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metrics. JIVP 2008.
[3] Y. Li, C. Huang, R. Nevatia: Learning to associate: HybridBoosted multi-target tracker for crowded scene. CVPR 2009.


eXTReMe Tracker