【NLP2CV】 From NLP to Computer Vision

  • From NLP to Computer Vision

1. Good Reference

  1. (Updating) Awesome BERT & Transfer Learning in NLP
    • BERT related papers
    • BERT related articles
      • Attention
      • Transformer
      • GPT (Generative pre-training Transformer)
      • CLIP Contrastive Language-Image Pretraining
      • DALL-E 2 (Text-to-Image Revolution)
  2. Transformer Attention Video
  3. (Old) Awesome BERT
  4. Transformer understanding by Me (1) attention
  5. Transformer understanding by Me (2) Transformer
  6. Awesome-CLIP: See Application contents (Detection)

2. Youtube Contents

  1. Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
  2. BERT and GPT (한국어 설명)
  3. BERT

3. Must Paper lists

PaperDateInstitute, CitationNote
Masked Autoencoders Are Scalable Vision Learners (MAE)21.11Kaiming He 
BEiT: BERT Pre-Training of Image Transformers21.06 MAE cited.
BERT: Pre-training of deep bidirectional transformers for language understanding18.10 See youtube.
Improving language understanding by generative pretraining18.11OpenAIGPT-1. MAE cited.
Language Models are Unsupervised Multitask Learners19.09OpenAIGPT-2. MAE cited.
Language Models are Few-Shot Learners20.05OpenAIGPT-3. MAE cited.
Contrastive Language-Image Pre-Training21.01OpenAICLIP
Zero-Shot Text-to-Image Generation21.02 DALL-E
Hierarchical Text-Conditional Image Generation with CLIP Latents22.04 DALL-E 2. To leverage CLIP representations for
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance22.04Google ResearchSurvey on NLP
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision22.02FAIRSEER 10 parameter

4. MAE related papers

PaperDateInstitute, CitationNote
Context autoencoder for self-supervised representation learning   
Tfill: Image completion via a transformer-based architecture   
ibot: Image bert pre-training with online tokenizer   
Benchmarking detection transfer learning with vision transformers   
Corrupted image modeling for self-supervised visual pre-training   
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers   
Bevt: Bert pretraining of video transformers   
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond   
ReMixer: Object-aware Mixing Layer for Vision Transformers and Mixers   
A Survey on Dropout Methods and Experimental Verification in Recommendation   

© All rights reserved By Junha Song.