watsonx.ai Runtime service plans

Last updated: Dec 05, 2024

You use watsonx.ai Runtime resources, which are measured in capacity unit hours (CUH), when you train AutoAI models, run machine learning models, or score deployed models. You use watsonx.ai Runtime resources, measured by tokens consumed or at an hourly rate, when you run inferencing services with foundation models. This topic describes the various plans you can choose, what services are included, and how computing resources are calculated.

Note: The watsonx.ai Runtime service was formerly known as the Watson Machine Learning service.

watsonx.ai Runtime in Cloud Pak for Data as a Service and watsonx

Important:

The watsonx.ai Runtime plan includes details for watsonx.ai. Watsonx.ai is a studio of integrated tools for working with generative AI, powered by foundation models, and machine learning models. If you are using Cloud Pak for Data as a Service, then the details for working with foundation models and metering prompt inferencing using Resource Units do not apply to your plan.

If you are enabled for both watsonx and Cloud Pak for Data as a Service, you can switch between the two platforms.

Choosing a watsonx.ai Runtime plan

watsonx.ai Runtime plans govern how you are billed for models you train and deploy with watsonx.ai Runtime and for prompts you use with foundation models. Choose a plan based on your needs:

Lite is a free plan with limited capacity. Choose this plan if you are evaluating watsonx.ai Runtime and want to try out the capabilities. The Lite plan does not support running a foundation model tuning experiment on watsonx.
Essentials is a pay-as-you-go plan that gives you the flexibility to build, deploy, and manage models to match your needs.
Standard is a high-capacity enterprise plan that is designed to support all of an organization's machine learning needs. Capacity unit hours are provided at a flat rate, while resource unit consumption is pay-as-you-go.

For plan details and pricing, see .

How resource consumption is tracked

For metering and billing purposes, machine learning models and deployments or foundation models are measured with these charge metrics:

Capacity Unit Hour (CUH) measures compute resource consumption per unit hour for usage and billing purposes. CUH measures all watsonx.ai Runtime activity except for Foundation Model inferencing.
Resource Unit (RU) measures foundation model inference consumption. Inferencing is the process of calling the foundation model to generate output in response to a prompt. Each RU equals 1,000 tokens. A token is a basic unit of text (typically 4 characters or 0.75 words) used in the input or output for a foundation model prompt.
Hour rate is used to calculate charges for custom foundation models that you import into watsonx.ai and deploy. The rate is based on configuration size and is charged for the duration of the model deployment.
Page rate is used to calculate charges for document text extraction. The page rate is set by plan.

What is measured for resource consumption?

Resources, whether measured with capacity unit hours (CUH) or resource units (RU) are consumed for running assets, not for working in tools. That is, there is no consumption charge for defining an experiment in AutoAI, but there is a charge for running the experiment to train the experiment pipelines. Similarly, there is no charge for creating a deployment space or defining a deployment job, but there is a charge for running a deployment job or inferencing against a deployed asset. Assets that run continuously, such as Jupyter notebooks, RStudio assets, Bash scripts, and custom model deployments consume resources for as long as they are active.

Note: You do not consume tokens when you use the generative AI search and answer app for this documentation site.

watsonx.ai Runtime plan details

The Lite plan provides enough free resources for you to evaluate the capabilities of watsonx.ai. You can then choose a paid plan that matches the needs of your organization, based on plan features and capacity.

Table 1. Plan details
Plan features	Lite	Essentials	Standard
watsonx.ai Runtime usage in CUH	20 CUH per month	CUH billing based on CUH rate multiplied by hours of consumption	2500 CUH per month
Foundation model inferencing in tokens or Resource Units (RU)	50,000 tokens per month	Billed for usage (1000 tokens = 1 RU)	Billed for usage (1000 tokens = 1 RU)
Max parallel Decision Optimization batch jobs per deployment	2	5	100
Deployment jobs retained per space	100	1000	3000
Deployment time to idle	1 day	3 days	3 days
HIPAA support	NA	NA	Dallas region only Must be enabled in your IBM Cloud account
Rate limit per plan ID	2 inference requests per second	8 inference requests per second	8 inference requests per second
Support for custom foundation models	Not available	Not available	Billed hourly by configuration
Document text extraction	Not available	Billed per page	Billed per page

Note: If you upgrade from Essentials to Standard, you cannot revert to an Essentials plan. You must create a new plan.

watsonx.ai Runtime pricing details

For more information on billing rates and how resource consumption is calculated, see:

Billing details for machine learning assets
catalog page.

Learn more

Billing details for generative AI assets
Billing details for machine learning assets
For more information on tracking computing resource allocation and consumption, see Runtime usage.

Parent topic: watsonx.ai Runtime