Introduction - Natural Language Processing
- Generative AI => LLM
- Text => Language Modeling, Sentiment Analysis, Text Summarization & Translation
- Jurassic-1 Jumbo
- Grok
- AutoGPT, Devin, AlphaCode 2
- RNN
- GRU
- LSTM (Long Short Term Memory) => Sequence generation
- Transformer => Use to predict what comes next
- GPT-4 (Generative Pre-trained Transformer)
- BERT (Bidirectional Encoder Representations from Transformers)
- Claude 3
- Llama3
- LaMDA
- StableLM
- Variational Autoencoders (VAEs) => Encoder, Latent Space, Decoder
- Autoregressive Models
- Image => Image Captioning, Visual Question Answering, Image Generation
- GAN (Generative Adversarial Networks) => Generator, Discriminator
- Text-to-Image
- Latent Diffusion
- Stable Diffusion 3
- DALL-E
- Image-to-Image
- SPADE (Spatially-Adaptive Image Manipulation)
- MUNIT (Multimodal Unsupervised Image-to-Image Translation)
- Voice => Speech Recognition (ASR), Voice Generation
- GAN
- Text-to-Speech
- WaveNet, WaveNet Vocoder
- Tacotron 2
- Transformer
- VAE
- Video => Multimodal Generation, Video Summarization
- VLOGGER
- GAN
- VAE
- VQ-VAE (Vector Quantized VAE), Pixel VAE
- Text-to-Video
- Transformer
- SORA
- UniVG (Unified-modal Video Generation)
- MM-Diffusion (Multi-Modal Diffusion model)
- Video-to-Video
- Attn2IN (Attention-to-Image Network)
- Game
- Genie
- GAN
- Autoencoders
- Procedural content generation (PCG)
- Project Malmo
- Dungeon Odyssey
- OpenAI Five