Manav Goyal

Introduction - Natural Language Processing

Generative AI => LLM
- Text => Language Modeling, Sentiment Analysis, Text Summarization & Translation
  - Jurassic-1 Jumbo
  - Grok
  - AutoGPT, Devin, AlphaCode 2
  - RNN
    - GRU
    - LSTM (Long Short Term Memory) => Sequence generation
  - Transformer => Use to predict what comes next
    - GPT-4 (Generative Pre-trained Transformer)
    - BERT (Bidirectional Encoder Representations from Transformers)
    - Claude 3
    - Llama3
    - LaMDA
    - StableLM
  - Variational Autoencoders (VAEs) => Encoder, Latent Space, Decoder
    - Grover
  - Autoregressive Models
    - ARIMA
- Image => Image Captioning, Visual Question Answering, Image Generation
  - GAN (Generative Adversarial Networks) => Generator, Discriminator
    - StyleGAN2
    - BigGAN
  - Text-to-Image
    - Latent Diffusion
      - Stable Diffusion 3
      - DALL-E
  - Image-to-Image
    - SPADE (Spatially-Adaptive Image Manipulation)
    - MUNIT (Multimodal Unsupervised Image-to-Image Translation)
- Voice => Speech Recognition (ASR), Voice Generation
  - GAN
    - MelGAN
  - Text-to-Speech
    - WaveNet, WaveNet Vocoder
    - Tacotron 2
    - Transformer
      - Merlin
  - VAE
- Video => Multimodal Generation, Video Summarization
  - VLOGGER
  - GAN
    - VGAN, ProGAN
  - VAE
    - VQ-VAE (Vector Quantized VAE), Pixel VAE
  - Text-to-Video
    - Transformer
      - SORA
      - UniVG (Unified-modal Video Generation)
      - MM-Diffusion (Multi-Modal Diffusion model)
  - Video-to-Video
    - Attn2IN (Attention-to-Image Network)
- Game
  - Genie
  - GAN
    - GameGAN
      - StyleGAN2-ADA
  - Autoencoders
  - Procedural content generation (PCG)
  - Project Malmo
  - Dungeon Odyssey
  - OpenAI Five