0 / 0
Supported foundation models available with watsonx.ai

Supported foundation models available with watsonx.ai

A collection of open source and IBM foundation models are deployed in IBM watsonx.ai. You can prompt the deployed foundation models in the Prompt Lab or programmatically.

The following models are available in watsonx.ai:

To understand how the model provider, instruction tuning, token limits, and other factors can affect which model you choose, see Choosing a model.

IBM foundation models

The following table lists the supported foundation models that IBM provides for inferencing. All IBM models are instruction-tuned.

Some IBM foundation models are also available from Hugging Face. License terms for IBM models that you access from Hugging Face are available from the Hugging Face website. For more information about contractual protections related to IBM indemnification for IBM foundation models that you access in watsonx.ai, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Table 1. IBM foundation models in watsonx.ai
Model name IBM indemnification Billing class Maximum tokens
Context (input + output)
Supported tasks More information
granite-13b-chat-v2 Yes Class 1 8192 • classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
granite-13b-instruct-v2 Yes Class 1 8192 • classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
Note: This foundation model can be prompt tuned.
granite-7b-lab Yes Class 1 8192 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Research paper (LAB)
granite-8b-japanese Yes Class 1 8192 • classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
granite-20b-multilingual Yes Class 1 8192 • classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
granite-3b-code-instruct Yes Class 1 128,000 • code
• classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
granite-8b-code-instruct Yes Class 1 128,000 • code
• classification
• extraction
• generation
• question answering
• summarization
Model card
Website
Research paper
granite-20b-code-instruct Yes Class 1 8192 • code
• classification
• extraction
• generation
• question answering
• summarization
Model card
Research paper
granite-34b-code-instruct Yes Class 1 8192 • code
• classification
• extraction
• generation
• question answering
• summarization
Model card
Research paper

 

For more information about the supported foundation models that IBM provides for embedding text, see Supported embedding models.

Third-party foundation models

The following table lists the supported foundation models that third parties provide. All third-party models are instruction-tuned.

Table 2. Supported third-party foundation models in watsonx.ai
Model name Provider Billing class Maximum tokens
Context (input + output)
Supported tasks More information
allam-1-13b-instruct National Center for Artificial Intelligence and Saudi Authority for Data and Artificial Intelligence Class 2 4096 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
• translation
Model card (Frankfurt data center)
codellama-34b-instruct Code Llama Class 2 16,384 • code Model card
Meta AI Blog
elyza-japanese-llama-2-7b-instruct ELYZA, Inc Class 2 4096 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
• translation
Model card
Blog on note.com
flan-t5-xl-3b Google Class 1 4096 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Research paper
Note: This foundation model can be prompt tuned.
flan-t5-xxl-11b Google Class 2 4096 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Research paper
flan-ul2-20b Google Class 3 4096 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
UL2 research paper
Flan research paper
jais-13b-chat Inception, Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), and Cerebras Systems Class 2 2048 • classification
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
• translation
Model card
Research paper
llama-3-2-1b-instruct Meta 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
Research paper
llama-3-2-3b-instruct Meta 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
Research paper
llama-3-2-11B-vision-instruct Meta 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
Research paper
llama-3-2-90B-vision-instruct Meta 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
Research paper
llama-guard-3-11B-vision-instruct Meta 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
Research paper
llama3-llava-next-8b-hf Meta Class 1 8192 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
LLaVA-NeXT blog
llama-3-1-8b-instruct Meta Class 1 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
llama-3-1-70b-instruct Meta Class 2 131,072 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
llama-3-405b-instruct Meta • Input tokens: Class 3
• Output tokens: Class 7
16,384 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
llama-3-8b-instruct Meta Class 1 8192 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
llama-3-70b-instruct Meta Class 2 8192 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Meta AI blog
llama-2-13b-chat Meta Class 1 4096 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Research paper
llama2-13b-dpo-v7 Meta Class 2 4096 • classification
• code
• extraction
• generation
• question answering
• retrieval-augmented generation
• summarization
Model card
Research paper (DPO)
mistral-large Mistral AI Mistral Large 32,768 • classification
• code
• extraction
• generation
• retrieval-augmented generation
• summarization
• translation
Model card
Blog post for Mistral Large 2
mixtral-8x7b-instruct-v01 Mistral AI Class 1 32,768 • classification
• code
• extraction
• generation
• retrieval-augmented generation
• summarization
• translation
Model card
Research paper
mt0-xxl-13b BigScience Class 2 4096 • classification
• generation
• question answering
• summarization
Model card
Research paper

 

Custom foundation models

In addition to working with foundation models that are curated by IBM, you can upload and deploy your own foundation models. After the custom models are deployed and registered with watsonx.ai, you can create prompts that inference the custom models from the Prompt Lab.

To learn more about how to upload, register, and deploy a custom foundation model, see Deploying a custom foundation model.

Foundation model details

The available foundation models support a range of use cases for both natural languages and programming languages. To see the types of tasks that these models can do, review and try the sample prompts.

allam-1-13b-instruct

The allam-1-13b-instruct foundation model is a bilingual large language model for Arabic and English provided by the National Center for Artificial Intelligence and supported by the Saudi Authority for Data and Artificial Intelligence that is fine-tuned to support conversational tasks. The ALLaM series is a collection of powerful language models designed to advance Arabic language technology. These models are initialized with Llama-2 weights and undergo training on both Arabic and English languages.

Note: This foundation model is available only in the Frankfurt data center. When you inference this model from the Prompt Lab, disable AI guardrails.
Usage
Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Size
13 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
Arabic (Modern Standard Arabic) and English
Instruction tuning information
allam-1-13b-instruct is based on the Allam-13b-base model, which is a foundation model that is pre-trained on a total of 3 trillion tokens in English and Arabic, including the tokens seen from its initialization. The Arabic data set contains 500 billion tokens after cleaning and deduplication. The additional data is collected from open-source collections and web crawls. The allam-1-13b-instruct foundation model is fine-tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
Model architecture
Decoder-only
License
Llama 2 community license and ALLaM license
Learn more
Read the following resources:

codellama-34b-instruct

A programmatic code generation model that is based on Llama 2 from Meta. Code Llama is fine-tuned for generating and discussing code.

Note: When you inference this model from the Prompt Lab, disable AI guardrails.
Usage
Use Code Llama to create prompts that generate code based on natural language inputs, explain code, or that complete and debug code.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Size
34 billion parameters
Token limits
Context window length (input + output): 16,384
Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 8192.
Supported natural languages
English
Supported programming languages
The codellama-34b-instruct-hf foundation model supports many programming languages, including Python, C++, Java, PHP, Typescript (Javascript), C#, Bash, and more.
Instruction tuning information
The instruction fine-tuned version was fed natural language instruction input and the expected output to guide the model to generate helpful and safe answers in natural language.
Model architecture
Decoder
License
License
Learn more
Read the following resources:

elyza-japanese-llama-2-7b-instruct

The elyza-japanese-llama-2-7b-instruct model is provided by ELYZA, Inc on Hugging Face. The elyza-japanese-llama-2-7b-instruct foundation model is a version of the Llama 2 model from Meta that is trained to understand and generate Japanese text. The model is fine-tuned for solving various tasks that follow user instructions and for participating in a dialog.

Note: This foundation model is available only in the Tokyo data center. When you inference this model from the Prompt Lab, disable AI guardrails.
Usage
General use with zero- or few-shot prompts. Works well for classification and extraction in Japanese and for translation between English and Japanese. Performs best when prompted in Japanese.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Sample prompt: Classification
Sample prompt: Translation
Size
7 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
Japanese, English
Instruction tuning information
For Japanese language training, Japanese text from many sources were used, including Wikipedia and the Open Super-large Crawled ALMAnaCH coRpus (a multilingual corpus that is generated by classifying and filtering language in the Common Crawl corpus). The model was fine-tuned on a data set that was created by ELYZA. The ELYZA Tasks 100 data set contains 100 diverse and complex tasks that were created manually and evaluated by humans. The ELYZA Tasks 100 data set is publicly available from HuggingFace.
Model architecture
Decoder
License
License
Learn more
Read the following resources:

flan-t5-xl-3b

The flan-t5-xl-3b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.

Note: This foundation model can be tuned by using the Tuning Studio.
Usage
General use with zero- or few-shot prompts.
Cost
Class 1. For pricing details, see Watson Machine Learning plans.
Try it out
Sample prompts
Size
3 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
Multilingual
Instruction tuning information
The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
Model architecture
Encoder-decoder
License
Apache 2.0 license
Learn more
Read the following resources:

flan-t5-xxl-11b

The flan-t5-xxl-11b model is provided by Google on Hugging Face. This model is based on the pretrained text-to-text transfer transformer (T5) model and uses instruction fine-tuning methods to achieve better zero- and few-shot performance. The model is also fine-tuned with chain-of-thought data to improve its ability to perform reasoning tasks.

Usage
General use with zero- or few-shot prompts.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Size
11 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
English, German, French
Instruction tuning information
The model was fine-tuned on tasks that involve multiple-step reasoning from chain-of-thought data in addition to traditional natural language processing tasks. Details about the training data sets used are published.
Model architecture
Encoder-decoder
License
Apache 2.0 license
Learn more
Read the following resources:

flan-ul2-20b

The flan-ul2-20b model is provided by Google on Hugging Face. This model was trained by using the Unifying Language Learning Paradigms (UL2). The model is optimized for language generation, language understanding, text classification, question answering, common sense reasoning, long text reasoning, structured-knowledge grounding, and information retrieval, in-context learning, zero-shot prompting, and one-shot prompting.

Usage
General use with zero- or few-shot prompts.
Cost
Class 3. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Size
20 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
English
Instruction tuning information
The flan-ul2-20b model is pretrained on the colossal, cleaned version of Common Crawl's web crawl corpus. The model is fine-tuned with multiple pretraining objectives to optimize it for various natural language processing tasks. Details about the training data sets used are published.
Model architecture
Encoder-decoder
License
Apache 2.0 license
Learn more
Read the following resources:

granite-13b-chat-v2

The granite-13b-chat-v2 model is provided by IBM. This model is optimized for dialog use cases and works well with virtual agent and chat applications.

Usage: Generates dialog output like a chatbot. Uses a model-specific prompt format. Includes a keyword in its output that can be used as a stop sequence to produce succinct answers. Follow the prompting guidelines for tips on usage. For more information, see Prompting granite-13b-chat-v2.

Note: This foundation model supports skills contributed by the open source community from InstructLab.

Cost: Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompt

Size

13 billion parameters

Token limits

Context window length (input + output): 8192

Supported natural languages

English

Instruction tuning information

The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

granite-13b-instruct-v2

The granite-13b-instruct-v2 model is provided by IBM. This model was trained with high-quality finance data, and is a top-performing model on finance tasks. Financial tasks evaluated include: providing sentiment scores for stock and earnings call transcripts, classifying news headlines, extracting credit risk assessments, summarizing financial long-form text, and answering financial or insurance-related questions.

Note: This foundation model can be tuned by using the Tuning Studio.
Usage

Supports extraction, summarization, and classification tasks. Generates useful output for finance-related tasks. Uses a model-specific prompt format. Accepts special characters, which can be used for generating structured output.

Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Experiment with samples:

Size

13 billion parameters

Token limits

Context window length (input + output): 8192

Supported natural languages

English

Instruction tuning information

The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

granite-7b-lab

The granite-7b-lab foundation model is provided by IBM. The granite-7b-lab foundation model uses a novel alignment tuning method from IBM Research. Large-scale Alignment for chatBots, or LAB is a method for adding new skills to existing foundation models by generating synthetic data for the skills, and then using that data to tune the foundation model.

Usage
Supports general purpose tasks, including extraction, summarization, classification, and more. Follow the prompting guidelines for tips on usage. For more information, see Prompting granite-7b-lab.
Note: This foundation model supports skills contributed by the open source community from InstructLab.
Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Sample: Generate a title for a passage

Size

7 billion parameters

Token limits

Context window length (input + output): 8192

Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.

Supported natural languages

English

Instruction tuning information

The granite-7b-lab foundation model is trained iteratively by using the large-scale alignment for chatbots (LAB) methodology.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. When you use the granite-7b-lab foundation model that is provided in watsonx.ai the contractual protections related to IBM indemnification apply. See the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

granite-8b-japanese

The granite-8b-japanese model is provided by IBM. The granite-8b-japanese foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate Japanese text.

Note: This foundation model is available only in the Tokyo data center. When you inference this model from the Prompt Lab, disable AI guardrails.
Usage

Useful for general purpose tasks in the Japanese language, such as classification, extraction, question-answering, and for language translation between Japanese and English.

Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Experiment with samples:

Size

8 billion parameters

Token limits

Context window length (input + output): 8192

Supported natural languages

English, Japanese

Instruction tuning information

The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. The granite-8b-japanese model was pretrained on 1 trillion tokens of English and 0.5 trillion tokens of Japanese text.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

granite-20b-multilingual

A foundation model from the IBM Granite family. The granite-20b-multilingual foundation model is based on the IBM Granite Instruct foundation model and is trained to understand and generate text in English, German, Spanish, French, and Portuguese.

Usage
English, German, Spanish, French, and Portuguese closed-domain question answering, summarization, generation, extraction, and classification.
Note: This foundation model supports skills contributed by the open source community from InstructLab.
Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompt: Translate text from French to English

Size

13 billion parameters

Token limits

Context window length (input + output): 8192

Supported natural languages

English, German, Spanish, French, and Portuguese

Instruction tuning information

The Granite family of models is trained on enterprise-relevant data sets from five domains: internet, academic, code, legal, and finance. Data used to train the models first undergoes IBM data governance reviews and is filtered of text that is flagged for hate, abuse, or profanity by the IBM-developed HAP filter. IBM shares information about the training methods and data sets used.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

Granite code models

Foundation models from the IBM Granite family. The Granite code foundation models are instruction-following models fine-tuned using a combination of Git commits paired with human instructions and open-source synthetically generated code instruction data sets.

The granite-8b-code-instruct v2.0.0 foundation model can process larger prompts with an increased context window length.

Note: These foundation models are available only in the Dallas data center. When you inference these models from the Prompt Lab, disable AI guardrails.
Usage

Granite code foundation models are designed to respond to coding-related instructions and can be used to build coding assitants. For more information and sample prompts, see Prompts for code.

Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Experiment with samples:

Available sizes

The model is available in the following sizes:

  • 3 billion parameters
  • 8 billion parameters
  • 20 billion parameters
  • 34 billion parameters
Token limits

Context window length (input + output)

  • granite-3b-code-instruct : 128,000

    The maximum new tokens, which means the tokens generated by the foundation model, is limited to 8192.

  • granite-8b-code-instruct : 128,000

    The maximum new tokens, which means the tokens generated by the foundation model, is limited to 8192.

  • granite-20b-code-instruct : 8192

    The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.

  • granite-34b-code-instruct : 8192

Supported natural languages

English

Supported programming languages

The Granite code foundation models support 116 programming languages including Python, Javascript, Java, C++, Go, and Rust. For the full list, see IBM foundation models.

Instruction tuning information

These models were fine-tuned from Granite code base models on a combination of permissively licensed instruction data to enhance instruction-following capabilities including logical reasoning and problem-solving skills.

Model architecture

Decoder

License

Terms of use

IBM-developed foundation models are considered part of the IBM Cloud Service. For more information about contractual protections related to IBM indemnification, see the IBM Client Relationship Agreement and IBM watsonx.ai service description.

Learn more

Read the following resources:

jais-13b-chat

The jais-13b-chat foundation model is a bilingual large language model for Arabic and English that is fine-tuned to support conversational tasks.

Note: This foundation model is available only in the Frankfurt data center. When you inference this model from the Prompt Lab, disable AI guardrails.
Usage
Supports Q&A, summarization, classification, generation, extraction, and translation in Arabic.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Sample prompt: Arabic chat
Size
13 billion parameters
Token limits
Context window length (input + output): 2048
Supported natural languages
Arabic (Modern Standard Arabic) and English
Instruction tuning information
Jais-13b-chat is based on the Jais-13b model, which is a foundation model that is trained on 116 billion Arabic tokens and 279 billion English tokens. Jais-13b-chat is fine tuned with a curated set of 4 million Arabic and 6 million English prompt-and-response pairs.
Model architecture
Decoder
License
Apache 2.0
Learn more
Read the following resources:

Llama 3.2 Instruct

The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-3-2-1b-instruct and llama-3-2-3b-instruct models are the smallest Llama 3.2 models that fit onto a mobile device. The models are lightweight, text-only models that can be used to build highly personalized, on-device agents.

For example, you can ask the models to summarize the last ten messages you received, or to summarize your schedule for the next month.

Usage

Generate dialog output like a chatbot. Use a model-specific prompt format. Their small size and modest compute resource and memory requirements enable the Llama 3.2 Instruct models to be run locally on most hardware, including on mobile and other edge devices.

Cost

For pricing details, see Watson Machine Learning plans.

Try it out
Available sizes
  • 1 billion parameters
  • 3 billion parameters
Token limits
  • 1b: Context window length (input + output): 131,072
  • 3b: Context window length (input + output): 131,072

The maximum new tokens, which means the tokens generated by the foundation models, is limited to 8192.

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Instruction tuning information

Pretrained on up to 9 trillion tokens of data from publicly available sources. Logits from the Llama 3.1 8B and 70B models were incorporated into the pretraining stage of the model development, where outputs (logits) from these larger models were used as token-level targets. In post-training, aligned the pre-trained model by using Supervised Fine-Tuning (SFT), Rejection Sampling (RS), and Direct Preference Optimization (DPO).

Model architecture

Decoder-only

License
Learn more

Read the following resources:

Llama 3.2 Vision Instruct

The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-3-2-11b-vision-instruct and llama-3-2-90b-vision-instruct models are built for image-in, text-out use cases such as document-level understanding, interpretation of charts and graphs, and captioning of images.

Usage

Generates dialog output like a chatbot and can perform computer vision tasks including classification, object detection and identification, image-to-text transcription (including handwriting), contextual Q&A, data extraction and processing, image comparison and personal visual assistance. Uses a model-specific prompt format.

Cost

For pricing details, see Watson Machine Learning plans.

Try it out
Available sizes
  • 11 billion parameters
  • 90 billion parameters
Token limits
  • 11b: Context window length (input + output): 131,072
  • 90b: Context window length (input + output): 131,072

The maximum new tokens, which means the tokens generated by the foundation models, is limited to 8192. The tokens that are counted for an image that you submit to the model are not included in the context window length.

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.

Instruction tuning information

Llama 3.2 Vision models use image-reasoning adaptor weights that are trained separately from the core large language model weights. This separation preserves the general knowledge of the model and makes the model more efficient both at pretraining time and run time. The Llama 3.2-Vision models were pretrained on 6 billion image-and-text pairs, which required far fewer compute resources than were needed to pretrain the Llama 3.1 70B foundation model alone. Llama 3.2 models also run efficiently because they can tap additional compute resources for image reasoning only when the input requires it.

Model architecture

Decoder-only

License
Learn more

Read the following resources:

llama-guard-3-11b-vision

The Meta Llama 3.2 collection of foundation models are provided by Meta. The llama-guard-3-11b-vision is a multimodal evolution of the text-only Llama-Guard-3 model. The model can be used to classify image and text content in user inputs (prompt classification) as safe or unsafe.

Usage
Use the model to check the safety of the image and text in an image-to-text prompt.

Cost For pricing details, see Watson Machine Learning plans.

Try it out
Available sizes
  • 11 billion parameters
Token limits

Context window length (input + output): 131,072

The maximum new tokens, which means the tokens generated by the foundation models, is limited to 8192. The tokens that are counted for an image that you submit to the model are not included in the context window length.

Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai with text-only inputs. English only when an image is included with the input.

Instruction tuning information

Pretrained model that is fine-tuned for content safety classification. For more information about the types of content that are classified as unsafe, see the model card.

Model architecture

Decoder-only

License
Learn more

Read the following resources:

llama3-llava-next-8b-hf

The llama3-llava-next-8b-hf model is an open source model based on the Meta-Llama-3-8B-Instruct foundation model from Meta. Llama3-llava-next-8b-hf is an auto-regressive language model that uses an optimized transformer architecture. The foundation model is a pre-trained large language model combined with a pre-trained vision encoder with improved reasoning, optical character recognition (OCR), and world knowledge that is optimized for multimodal chatbot use cases.

You can use the llama3-llava-next-8b-hf model for tasks like image captioning, visual question answering, and multimodal chatbot use cases.

Usage
Generates dialog output like a chatbot and can evaluate and describe image files. Uses a model-specific prompt format.
Cost
Class 1. For pricing details, see Watson Machine Learning plans.
Try it out
See Chatting with documents and images
Size
  • 8 billion parameters
Token limits
Context window length (input + output): 8192
  • The maximum new tokens, which means the tokens generated by the foundation model, is limited to 1024. The tokens that are counted for an image that you submit to the model are not included in the context window length.
Supported natural languages
English
Instruction tuning information
The llama3-llava-next-8b-hf was pretrained on a combination of filtered image-text pairs, GPT-generated multimodal instruction-following data, and academic-task-oriented visual questions and answers (VQA) data.
Model architecture
Decoder-only
License
META LLAMA 3 Community License
Learn more
Read the following resources:

Llama 3.1 Instruct

The Meta Llama 3.1 collection of foundation models are provided by Meta. The Llama 3.1 foundation models are pretrained and instruction tuned text-only generative models that are optimized for multilingual dialogue use cases. The models use supervised fine-tuning and reinforcement learning with human feedback to align with human preferences for helpfulness and safety.

The llama-3-405b-instruct model is Meta's largest open-sourced foundation model to date. This foundation model can also be used as a synthetic data generator, post-training data ranking judge, or model teacher/supervisor that can improve specialized capabilities in more inference-friendly, derivative models.

Usage

Generates dialog output like a chatbot. Uses a model-specific prompt format.

Cost
  • 8b: Class 1
  • 70b: Class 2
  • 405b: Class 3 (input), Class 7 (output)

For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompt: Converse with Llama 3

Available sizes
  • 8 billion parameters
  • 70 billion parameters
  • 405 billion parameters
Token limits
  • 8b and 70b: Context window length (input + output): 131,072
    • The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.
  • 405b: Context window length (input + output): 16,384
    • Although the model supports a context window length of 131,072, the window is limited to 16,384 to reduce the time it takes for the model to generate a response.
Supported natural languages

English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai

Instruction tuning information

Llama 3.1 was pretrained on 15 trillion tokens of data from publicly available sources. The fine tuning data includes publicly available instruction datasets, as well as over 25 million synthetically generated examples.

Model architecture

Decoder-only

License
Learn more

Read the following resources:

Llama 3 Instruct

The Meta Llama 3 family of foundation models are accessible, open large language models that are built with Meta Llama 3 and provided by Meta on Hugging Face. The Llama 3 foundation models are instruction fine-tuned language models that can support various use cases.

Usage: Generates dialog output like a chatbot.

Cost
  • 8b: Class 1
  • 70b: Class 2

For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompt: Converse with Llama 3

Available sizes
  • 8 billion parameters
  • 70 billion parameters
Token limits

Context window length (input + output): 8192

Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 4096.

Supported natural languages

English

Instruction tuning information

Llama 3 features improvements in post-training procedures that reduce false refusal rates, improve alignment, and increase diversity in the foundation model output. The result is better reasoning, code generation, and instruction-following capabilities. Llama 3 has more training tokens (15T) that result in better language comprehension.

Model architecture

Decoder-only

License

META LLAMA 3 Community License

Learn more

Read the following resources:

Llama 2 Chat (Deprecated)

The Llama 2 Chat model is provided by Meta on Hugging Face. The fine-tuned model is useful for chat generation. The model is pretrained with publicly available online data and fine-tuned using reinforcement learning from human feedback.

Usage

Generates dialog output like a chatbot. Uses a model-specific prompt format.

Cost
  • 13b: Class 1

For pricing details, see Watson Machine Learning plans.

Try it out

Experiment with samples:

Available sizes
  • 13 billion parameters
Token limits

Context window length (input + output): 4096

Supported natural languages

English

Instruction tuning information

Llama 2 was pretrained on 2 trillion tokens of data from publicly available sources. The fine-tuning data includes publicly available instruction data sets and more than one million new examples that were annotated by humans.

Model architecture

Decoder-only

License

License

Learn more

Read the following resources:

llama2-13b-dpo-v7

The llama2-13b-dpo-v7 foundation model is provided by Minds & Company. The llama2-13b-dpo-v7 foundation model is a version of llama2-13b foundation model from Meta that is instruction-tuned and fine-tuned by using the direct preference optimzation method to handle Korean.

Note: This foundation model is available only in the Tokyo data center. When you inference this model from the Prompt Lab, disable AI guardrails.
Usage
Suitable for many tasks, including classification, extraction, summarization, code creation and conversion, question-answering, generation, and retreival-augmented generation in Korean.
Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with samples:
Size
13.2 billion parameters
Token limits
Context window length (input + output): 4096
Supported natural languages
English, Korean
Instruction tuning information
Direct preference optimzation (DPO) is an alternative to reinforcement learning from human feedback. With reinforcement learning from human feedback, responses must be sampled from a language model and an intermediate step of training a reward model is required. The direct preference optimzation uses a binary method of reinforcement learning where the model chooses the best of two answers based on preference data.
Model architecture
Decoder-only
License
License
Learn more
Read the following resources:

mistral-large

Mistral Large 2 is a large language model developed by Mistral Al. The mistral-large foundation model is fluent in and understands the grammar and cultural context of English, French, Spanish, German, and Italian. The foundation model can also understand dozens of other languages. The model has a large context window, which means you can add large documents as contextual information in prompts that you submit for retrieval-augmented generation (RAG) use cases. The mistral-large foundation model is effective at programmatic tasks, such as generating, reviewing, and commenting on code, function calling, and can generate results in JSON format.

Usage

Suitable for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Due to the model's large context window, use the max tokens parameter to specify a token limit when prompting the model.

Cost

Mistral Large. For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompts

Token limits

Context window length (input + output): 128,000

Note:

  • Although the model supports a context window length of 128,000, the window is limited to 32,768 to reduce the time it takes for the model to generate a response.
  • The maximum new tokens, which means the tokens generated by the foundation model, is limited to 16,384.
Supported natural languages

English, French, German, Italian, Spanish, Italian, Chinese, Japanese, Korean, Portuguese, Dutch, Polish, and dozens of other languages.

Supported programming languages

The mistral-large model has been trained on over 80 programming languages including Python, Java, C, C++, JavaScript, Bash, Swift, and Fortran.

Instruction tuning information

The mistral-large foundation model is pre-trained on diverse data sets like text, codebases, and mathematical data from various domains.

Model architecture

Decoder-only

License

For terms of use, including information about contractual protections related to capped indemnification, see Terms of use.

Learn more

Read the following resources:

mixtral-8x7b-instruct-v01

The mixtral-8x7b-instruct-v01 foundation model is provided by Mistral AI. The mixtral-8x7b-instruct-v01 foundation model is a pretrained generative sparse mixture-of-experts network that groups the model parameters, and then for each token chooses a subset of groups (referred to as experts) to process the token. As a result, each token has access to 47 billion parameters, but only uses 13 billion active parameters for inferencing, which reduces costs and latency.

Usage

Suitable for many tasks, including classification, summarization, generation, code creation and conversion, and language translation. Due to the model's unusually large context window, use the max tokens parameter to specify a token limit when prompting the model.

Cost

Class 1. For pricing details, see Watson Machine Learning plans.

Try it out

Sample prompts

Size

46.7 billion parameters

Token limits

Context window length (input + output): 32,768

Note: The maximum new tokens, which means the tokens generated by the foundation model, is limited to 16,384.

Supported natural languages

English, French, German, Italian, Spanish

Instruction tuning information

The Mixtral foundation model is pretrained on internet data. The Mixtral 8x7B Instruct foundation model is fine-tuned to follow instructions.

Model architecture

Decoder-only

License

Apache 2.0 license

Learn more

Read the following resources:

mt0-xxl-13b

The mt0-xxl-13b model is provided by BigScience on Hugging Face. The model is optimized to support language generation and translation tasks with English, languages other than English, and multilingual prompts.

Usage: General use with zero- or few-shot prompts. For translation tasks, include a period to indicate the end of the text you want translated or the model might continue the sentence rather than translate it.

Cost
Class 2. For pricing details, see Watson Machine Learning plans.
Try it out
Experiment with the following samples:
Size
13 billion parameters
Supported natural languages
Multilingual
Token limits
Context window length (input + output): 4096
Supported natural languages
The model is pretrained on multilingual data in 108 languages and fine-tuned with multilingual data in 46 languages to perform multilingual tasks.
Instruction tuning information
BigScience publishes details about its code and data sets.
Model architecture
Encoder-decoder
License
Apache 2.0 license
Learn more
Read the following resources:

Any deprecated foundation models are highlighted with a deprecated warning icon Warning icon. For more information about deprecation, including foundation model withdrawal dates, see Foundation model lifecycle.

Learn more

Parent topic: Developing generative AI solutions

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more