GT영역과 Pred영역과 비교해서 0.5 iou를 넘는 매칭 탐지. (GT영역은 1개 or 0개의 Pred영역화 매칭된다. → 2개는 수학적으로 불가능하다.)
클래스 하나하나 PQ를 계산한다. 예시 이미지에서 person만을 고려해보자. GT영역 or Pred영역이 person인 것에 대해 매칭을 수행한다.
매칭된 GT영역을 TP, FN, FP로 구분하고 다음과 깉이 해석할 수 있다. (1) TP: 매칭이 잘 된 것들. (2) FN: 갈색 영역과 같이 class는 person인데, 매칭되는 Pred 영역의 class or instance id가 틀린것 (3) FP: 매칭된 GT영역이 person 조차 아닌것.
Post-processing: To obtain outputs for Panoptic segmentation
Problem: CVPR18 methods have two separate branches designed for semantic and instance segmentation.
Solution: our model exploits a single network as backbone.
New points
semantic segmentation head: a deformable convolution-based
panoptic head: Refer Sec 3.1
Unknown prediction: Refer Sec 3.1. UPSNet allows UPSNet to classify a pixel as the unknown class. In evaluation, any pixel belonging to unknown is ignored.
ISR is “consistency training” from Semi-supervised learning (semi segment CPS, Pseudo)
ITR is the complementary prediction between semantic and instance segmentation.
In terms of Pseudo label quality, stuff (semantic > instance) and things (semantic < instance) → Pseudo label rectifying
Experiments show the comparisons for semantic segmentation models, instance segmentation models, and panoptic segmentation models
Implementation details
Model: Panoptic segmentation model (two separate inferences & post-processing for merging)
Multi-task self-training (MTST): The training process is too complex. For example, semantic Training → instance Training → semantic Training → instance Training → panoptic evaluation
WARNING: Figure 1,2 do not mean the whole training process. So, It can be confusing.