CLIP · ViT-L/14 · Foundation Model

LAION-5B · Open Multimodal Corpus

Large-scale Artificial Intelligence Open Network — 5.85B CLIP-filtered image-text pairs used for foundation model pretraining, Stable Diffusion, and multimodal research.

Image-Text Pairs
5.85 B
Languages
100+
Total Size
240 TB
Open License
CC-BY 4.0
Resolution distribution
Image size buckets (millions)
Language coverage
Top languages (millions of pairs)
ENZHESDEFRJAOther
Foundation Pretraining
Backbone corpus for CLIP, OpenCLIP, ALIGN-style contrastive pretraining at scale.
Generative AI
Powering Stable Diffusion, latent diffusion models, and text-to-image generators.
Multilingual VLMs
LAION-5B includes LAION-2B-multi enabling 100+ language multimodal models.
Dataset card
LAION-5B at a glance
  • • 5.85 billion CLIP-filtered image-text pairs
  • • Splits: LAION-2B-EN, LAION-2B-MULTI, LAION-1B-NOLANG
  • • Built from Common Crawl, NSFW/illegal content tagged
  • • Filter: cosine similarity ≥ 0.28 via OpenAI CLIP ViT-B/32
  • • Released by LAION e.V. under CC-BY 4.0
  • • Used by: Stable Diffusion, OpenCLIP, BLIP-2, Kosmos-2
  • • Hosted as Parquet + WebDataset shards
  • • Average caption length: 12.4 tokens