Table 1

A concise summary of foundation models and their features

Model typeFocusExamplesData used for training
Large language models (LLMs)TextPaLM, LLaMA, GPT-3.5Text corpora (books, articles, etc)
Large vision models (LVMs)ImagesRETFound23, SAM70, ViT71, Vision72Image datasets
Vision-language models (VLMs)Text and imagesCLIP41, Vision, DALL-E73, Stable Diffusion74Image-text pairs
Large multimodal models (LMMs)Multiple formatsGPT-4, GeminiVarious (text, images, audio, etc)
  • CLIP, Contrastive Language-Image Pre-Training; GPT, Generative Pre-trained Transformer; LLaMA, Large Language Model Meta AI; PaLM, Pathways Language Model; SAM, Segment Anything Model; ViT, Vision Transformer;