【SSL】 Survey on Self-supervised learning

23 Mar 2022 in Artificial Intelligence

This post is a research note for survey on SSL

0. Research key points

After MoCo
Relationship: CL-based & non-CL-based
Self-supervised adversarial robustness

1. Reference

Presentation materials on SSL: Roadmap by RCV lab
Awesome Self-Supervised Learning
mmselfsup
facebookresearch/vissl
DIno

2. To Read List

RCV presentation materials, My past posts relative to SSL
Survey: Self-supervised Learning: Generative or Contrastive, arXiv
Towards Understanding and Simplifying MoCo, CVPR22 (not methods, only relative to key points below)
SEER2021: Self-supervised Pretraining of Visual Features in the Wild
SEER2022: Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
DINO: Emerging Properties in Self-Supervised Vision Transformers
Improving Contrastive Learning by Visualizing Feature Transformation, ICCV orcal 2021 (from 2021 awesome SSL)
How Well Do Self-Supervised Models Transfer?, CVPR21
Rethinking pre-training and self-training, 2020, Google Brain

2.1 RCV presentation

MoCo (Momentum Contrast)
- Contrastive learning can be interpreted as an dictionary look-up task.
- End-to-end & Memory bank « Momentum Contrast: Due to consistent key representations.
- MoCo outperms supervised pre-training (Setting: Freeze encoder, and train task-specific head with the ground truth)
- MoCo v2 (using SimCLR ideas): MLP projection head, Blur and color distortion augmentation.
CMC (Contrastive Multiview Coding)
- View-invariant representations (Color => (L + ab) = positive pair)
- Or, with more than two views (+ depth, segment) → Increasing representation quality of model
PIRL (Pretext-Invariant Representation Learning)
- The goal of SSL: To construct image representations that are semantically meaningful == Image representations should be invariant under any transformations (augmentation).
- The backbone take patchs individually. The features of patchs are concatenated and linearly projected to 128 dimension.
- PIRL shows competitive performance MoCo in classificatino task.
SimCLR (a simple framework for contrastive learning of visual representations)
- Multiple data augmentation & Non-linear projection head & Not a memory bank
- Contrastive loss (InfoNCE) with 2N images (N + augmentated N & 1 positive sample, 2(N-1) Negative samples)
BYOL (Bootstrap your own latent)
- Motivation of iteratiely updated target network.
- 2 MLP (projection, prediction) & Stop gradient & non-CL loss (only consider similarity)
- Augmentation & Momentum updated target network
SwAV (Swapping Assignments between multi views)
- Conventional limitation: (1) Images from the same class are treated as different instances (2) not using all the combinations of augmentations.
- Clustering-based Approaches (with prototypes), Swapped prediction learning (prediction → Code)
SimSiam (Simple Siamese)
- In order to prevent networks from collapsing, (1) SimCLR:negative pairs (2) momentum encoder (3) online clustering
- Siamese network with none of the above work well with same encoder, predictor and stop-gradient.
MoCo v3
- BYOL: prediction head, Symmetrized loss
- ViT backbone implement

2.2 Survey on SSL: Generative or Contrastive (8~13p)

SSL aims at recovering, which is still in the paradigm of supervised settings. 반면, unsup은 좀 더 넓은 범위를 다룬다. 예를 들어 clustering, community discovery.
Contrastive SSL
1. Basic
  - Contrastive learning aims at ”learn to compare” through a Noise Contrastive Estimation (NCE). We can extend NCE into InfoNCE with more dissimilar pairs involved.
  - Notation: (1) SSL with Generative model = trained on ImageNet. (2) SSL with Discriminative model = trained with InfoNCE.
2. Context-Instance Contrast (Before MoCo)
  - Predict Relative Position: jigsaw, rotation
  - Maximize Mutual Infor- mation: CPC, AMDIM, CMC
3. Instance-Instance Contrast
  - Basic
    1. Discarding Mutual infomation [129], CL studies the relationships between different samples’ instance-level (한 이미지 그 자체) local representations, rather than context-level (한 이미지 내 모든 것, dog and grass).
  - Cluster Discrimination
    1. DeepCluster [17] to pull similar images near in the embedding space. (K-means). It is time-consuming due to two-stage training & poor performing
    2. SwAV
      - Assignment as codes (centroids)
      - small model & small batch-size
      - upgraed version: SEER trained on Instagram images.
  - Instance Discrimination
    1. CMC
      - However, since it only samples one negative sample for each postitive one, It seems to be constrained by Deep InfoMax.
    2. MoCo
      - Substantially increases the amount of negative samples.
      - Momentum encoders prevents the fluctuation of loss convergence.
      - Auxiliary tech: (1) batch shuffling (2) temperature
    3. PIRL
      - MoCo has too simple positive pair without any transformation and augmentation. So, PIRL adapt an jigsawed image as similar pairs.
    4. SimCLR
      - Illustrate a positive sample with data augmentation
      - Try to handle the large-scale negative samples problem.
    5. InfoMin
      - the views should only share the label information. (그 이외의 정보까지 공유하는 augmented images는 contrastive learning에 최적화 된 데이터셋이 아니다.)
    6. BYOL
      - Discard negative sampling with the experimental motivation.
      - Cross-entropy → MSE
      - Batch-size is no longer a critical point, compared to MoCo and SimCLR.
    7. SimSiam
      - Demenstrate the importance of ‘stop gradient’
    8. ReLIC (ICLR 21)
      - Add an extra KL-divergence regularizer. & Show analysis of generalization ability and robustness
4. Self-supervised Contrastive Pre-training for Semisupervised Self-training
  1. Rethinking pre-training and self-training : The model with joint pre-training and self- training is the best.

2.3 Towards Understanding and Simplifying MoCo, CVPR22

SimSiam 분석 논문: [2,5,60]

A Survey on Self-supervised Learning in Computer Vision

3. Survey papers contents

@ Self-supervised Visual Feature Learning with Deep Neural Networks: A Survey, 19.02.16, 670cited

0. Abstract
1. Introduction
	* motivation
	* term definition
2. Formulation of different learning
	* supervised
	* semi-supervised
	* weakly..
3. Common deep network architectures
	* image features (alexnet, vgg, resnet ..)
	* video features
4. Commonly used pretext and downstream tasks
5. Dataset
6. Image Feature learning
	* Generation-based Learning
	* Context-Based Learning
7. Video feature learning
8. Performance comparison
9. Future directions
10. Conclusion

@ Contrastive Representation Learning: A Framework and Review, 20.10.10 ~ 20.10.27, 212cited

0. Abstract
1. Introduction
2. what is contrastive learning
	* representation learning
	* contrastive representation learning
3. A taxonomy for contrastive learing
	* CRL framework
	* a taxonomy of similarity
		- multisensory signals
		- data transformation
		- context-instance relationship
		- sequential coherence and consistency
		- natural clustering
	* a taxonomy of encoders (Ex, dictionary, memory bank)
		- end-to-end encoders
		- online-offline encoders
		- pre-trained encoders (BERT)
	* a taxonomy of transform heads (projection head..)
	* a taxonomy of contrastive loss functions
4. Development of contrastive learning
5. Application (language, vision, graph, audio)
6. Discussion and outlook
7. Conclusion

@ A Survey on Contrastive Self-supervised Learning, 20.10.31 ~ 21.02.07, 135cited

0. Abstract
1. Introduction
2. Pretext Tasks
3. Architectures
	* ene-to-end
	* memory bank
	* momentum encoder
	* clustering
4. Encoders (backbone details)
5. Training
6. Downstream Tasks
7. Benchmarks (datasets)
8. Contrastive learning in NLP
9. Discussions and Future directions
	* lack of theoretical foundation
	* selection of data augmentation and pretext tasks
	* proper negative sampling during training
	* dataset biases
10. Conclusion

@ Self-supervised Learning: Generative or Contrastive, 20.06.15 ~ 21.03.20, 231cited

4. I am writing the survey paper.

【SSL】 Survey on Self-supervised learning

0. Research key points

1. Reference

2. To Read List

2.1 RCV presentation

2.2 Survey on SSL: Generative or Contrastive (8~13p)

2.3 Towards Understanding and Simplifying MoCo, CVPR22

3. Survey papers contents

4. I am writing the survey paper.

Just Do It And Then Some

Error

0. Research key points

1. Reference

2. To Read List

2.1 RCV presentation

2.2 Survey on SSL: Generative or Contrastive (8~13p)

2.3 Towards Understanding and Simplifying MoCo, CVPR22

3. Survey papers contents

4. I am writing the survey paper.

Templates (for web app):

Error