Billing details for machine learning assets

Learn about how usage for machine learning assets is measured using capacity unit hours (CUH).

watsonx.ai Runtime compute usage and pricing

watsonx.ai Runtime compute usage is calculated by the number of capacity unit hours (CUH) consumed by an active machine learning instance. The rate of capacity units per hour consumed is determined by the computing requirements of your machine learning assets and models. For example, a model with a large, complex data set will consume more training resources than a model with a smaller, simpler data set. Note that scaling a deployment to support more concurrent users and requests also increases CUH consumption.

Tip: Because there are so many variables that affect resource consumption for a deployment, the recommended practice is to run tests on your models and deployments to analyze CUH consumption.

For all plans:

Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
watsonx.ai Runtime places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.
Compute time is calculated to the millisecond. However, there is a one-minute minimum for each distinct operation. That is, a training run that takes 12 seconds is billed as one minute toward the capacity unit hour quota, while a training run that takes 83.555 seconds is billed exactly as calculated.
The way that online deployments consume capacity units is based on framework. For some frameworks, CUH is charged for the number of hours the deployment asset is active in a deployment space. For example, SPSS models in online deployment mode that run 24 hours a day for seven days a week consume CUH and are charged for that period. There is no idle time for an active online deployment. For other frameworks, CUH is charged according to scoring duration. See the CUH consumption table for details on how CUH is calculated.

CUH consumption rates by asset type

Table 3. CUH consumption rates by asset type
Asset type	Capacity type	Capacity units per hour
AutoAI experiment	8 vCPU and 32 GB RAM	20
Decision Optimization training	2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	6 7 9 13
Decision Optimization deployments	2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	30 40 50 60
Machine Learning models (training, evaluating, or scoring)	1 vCPU and 4 GB RAM 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM	0.5 1 2 4 8
Foundation model tuning experiment (watsonx only)	NVIDIA A100 80GB GPU	43

CUH consumption by deployment and framework type

CUH consumption is calculated using these formulas:

Deployment type	Framework	CUH calculation
Online	AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny	deployment_active_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Online	Spark, PMML, Scikit-Learn, Pytorch, XGBoost	score_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework
Batch	all frameworks	job_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework

For example, consider a Decision Optimization batch deployment job that runs for 15 minutes. Resource consumption is calculated this way: 15 minutes = 0.25 hours, on 2 nodes, and with 2 vCPU and 8 GB RAM. This combination results in a CUH rate of 30, so every time the job runs it consumes 0.25 * 2 * 30, which equals 15 CUH.

These tables show the capacity units per hour calculation for predefined machine learning environments, by usage type.

Capacity units per hour for training, evaluating, or scoring models

Capacity type	Capacity units per hour
Extra small: 1 vCPU and 4 GB RAM	0.5
Small: 2 vCPU and 8 GB RAM	1
Medium: 4 vCPU and 16 GB RAM	2
Large: 8 vCPU and 32 GB RAM	4
Extra large: 16 vCPU and 64 GB RAM	8

Capacity units per hour for AutoAI experiments

Capacity type	Capacity units per hour
8 vCPU and 32 GB RAM	20

Capacity units per hour for Decision Optimization experiments

These plans apply to Decision Optimization experiments run in watsonx.ai Studio.

Capacity type	Capacity units per hour
Decision Optimization: 2 vCPU and 8 GB RAM	6
Decision Optimization: 4 vCPU and 16 GB RAM	7
Decision Optimization: 8 vCPU and 32 GB RAM	9
Decision Optimization: 16 vCPU and 64 GB RAM	13

Capacity units per hour for Decision Optimization in watsonx.ai Runtime

These plans apply to Decision Optimization deployed and run from watsonx.ai Runtime.

Capacity type	Capacity units per hour
Decision Optimization: 2 vCPU and 8 GB RAM	30
Decision Optimization: 4 vCPU and 16 GB RAM	40
Decision Optimization: 8 vCPU and 32 GB RAM	50
Decision Optimization: 16 vCPU and 64 GB RAM	60

Monitoring resource usage

You can track resource usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH for an entire account. For more information, see Monitoring account resource usage.

You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the watsonx.ai Runtime service owner. For more information, see Monitoring resources.

Tracking CUH consumption for machine learning in a notebook

To calculate capacity unit hours in a notebook, use:

CP =  client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)

For example:

'capacity_units': {'current': 19773430}

19773430/(3600*1000)

returns 5.49 CUH

For details, see the Service Instances section of the IBM watsonx.ai Runtime API documentation.

Learn more

Parent topic: watsonx.ai Runtime plans