【NLP2CV】 From NLP to Computer Vision
- From NLP to Computer Vision
1. Good Reference
- (Updating) Awesome BERT & Transfer Learning in NLP
- BERT related papers
- BERT related articles
- Attention
- Transformer
- GPT (Generative pre-training Transformer)
- CLIP Contrastive Language-Image Pretraining
- DALL-E 2 (Text-to-Image Revolution)
- Transformer Attention Video
- (Old) Awesome BERT
- Transformer understanding by Me (1) attention
- Transformer understanding by Me (2) Transformer
- Awesome-CLIP: See Application contents (Detection)
2. Youtube Contents
- Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
- BERT and GPT (한국어 설명)
- BERT
3. Must Paper lists
Paper | Date | Institute, Citation | Note |
---|
Masked Autoencoders Are Scalable Vision Learners (MAE) | 21.11 | Kaiming He | |
BEiT: BERT Pre-Training of Image Transformers | 21.06 | | MAE cited. |
BERT: Pre-training of deep bidirectional transformers for language understanding | 18.10 | | See youtube. |
Improving language understanding by generative pretraining | 18.11 | OpenAI | GPT-1. MAE cited. |
Language Models are Unsupervised Multitask Learners | 19.09 | OpenAI | GPT-2. MAE cited. |
Language Models are Few-Shot Learners | 20.05 | OpenAI | GPT-3. MAE cited. |
Contrastive Language-Image Pre-Training | 21.01 | OpenAI | CLIP |
Zero-Shot Text-to-Image Generation | 21.02 | | DALL-E |
Hierarchical Text-Conditional Image Generation with CLIP Latents | 22.04 | | DALL-E 2. To leverage CLIP representations for |
Pathways Language Model (PaLM): Scaling to 540 Billion Parameters for Breakthrough Performance | 22.04 | Google Research | Survey on NLP |
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision | 22.02 | FAIR | SEER 10 parameter |
Paper | Date | Institute, Citation | Note |
---|
Context autoencoder for self-supervised representation learning | | | |
Tfill: Image completion via a transformer-based architecture | | | |
ibot: Image bert pre-training with online tokenizer | | | |
Benchmarking detection transfer learning with vision transformers | | | |
Corrupted image modeling for self-supervised visual pre-training | | | |
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers | | | |
Bevt: Bert pretraining of video transformers | | | |
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond | | | |
ReMixer: Object-aware Mixing Layer for Vision Transformers and Mixers | | | |
A Survey on Dropout Methods and Experimental Verification in Recommendation | | | |