CLIP · ViT-L/14 · Foundation Model

LAION-5B · Open Multimodal Corpus

Large-scale Artificial Intelligence Open Network — 5.85B CLIP-filtered image-text pairs used for foundation model pretraining, Stable Diffusion, and multimodal research.

⌘K

GPU78%

Loss0.184

Throughput312/s

Image-Text Pairs

5.85 B

Languages

100+

Total Size

240 TB

Open License

CC-BY 4.0

Resolution distribution

Image size buckets (millions)

Language coverage

Top languages (millions of pairs)

en—English

zh—Chinese

es—Spanish

de—German

fr—French

ja—Japanese

other—Other

Foundation Pretraining

Backbone corpus for CLIP, OpenCLIP, ALIGN-style contrastive pretraining at scale.

Generative AI

Powering Stable Diffusion, latent diffusion models, and text-to-image generators.

Multilingual VLMs

LAION-5B includes LAION-2B-multi enabling 100+ language multimodal models.

Dataset card

LAION-5B at a glance

• 5.85 billion CLIP-filtered image-text pairs
• Splits: LAION-2B-EN, LAION-2B-MULTI, LAION-1B-NOLANG
• Built from Common Crawl, NSFW/illegal content tagged
• Filter: cosine similarity ≥ 0.28 via OpenAI CLIP ViT-B/32

• Released by LAION e.V. under CC-BY 4.0
• Used by: Stable Diffusion, OpenCLIP, BLIP-2, Kosmos-2
• Hosted as Parquet + WebDataset shards
• Average caption length: 12.4 tokens