Billing details for machine learning assets
Learn about how usage for machine learning assets is measured using capacity unit hours (CUH).
watsonx.ai Runtime compute usage and pricing
watsonx.ai Runtime compute usage is calculated by the number of capacity unit hours (CUH) consumed by an active machine learning instance. The rate of capacity units per hour consumed is determined by the computing requirements of your machine learning assets and models. For example, a model with a large, complex data set will consume more training resources than a model with a smaller, simpler data set. Note that scaling a deployment to support more concurrent users and requests also increases CUH consumption.
For all plans:
- Capacity-unit-hour (CUH) rate consumption for training is based on training tool, hardware specification, and runtime environment.
- Capacity-unit-hour (CUH) rate consumption for deployment is based on deployment type, hardware specification, and software specification.
- watsonx.ai Runtime places limits on the number of deployment jobs retained for each single deployment space. If you exceed your limit, you cannot create new deployment jobs until you delete existing jobs or upgrade your plan. By default, jobs metadata will be auto-delete after 30 days. You can override this value when creating a job. See Managing jobs.
- Time to idle refers to the amount of time to consider a deployment active between scoring requests. If a deployment does not receive scoring requests for a given duration, it is treated as inactive, or idle, and billing stops for all frameworks other than SPSS.
- A plan allows for at least the stated rate limit, and the actual rate limit can be higher than the stated limit. For example, the Lite plan might process more than 2 requests per second without issuing an error. If you have a paid plan and believe you are reaching the rate limit in error, contact IBM Support for assistance.
- Compute time is calculated to the millisecond. However, there is a one-minute minimum for each distinct operation. That is, a training run that takes 12 seconds is billed as one minute toward the capacity unit hour quota, while a training run that takes 83.555 seconds is billed exactly as calculated.
- The way that online deployments consume capacity units is based on framework. For some frameworks, CUH is charged for the number of hours the deployment asset is active in a deployment space. For example, SPSS models in online deployment mode that run 24 hours a day for seven days a week consume CUH and are charged for that period. There is no idle time for an active online deployment. For other frameworks, CUH is charged according to scoring duration. See the CUH consumption table for details on how CUH is calculated.
CUH consumption rates by asset type
Asset type | Capacity type | Capacity units per hour |
---|---|---|
AutoAI experiment | 8 vCPU and 32 GB RAM | 20 |
Decision Optimization training | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
6 7 9 13 |
Decision Optimization deployments | 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
30 40 50 60 |
Machine Learning models (training, evaluating, or scoring) |
1 vCPU and 4 GB RAM 2 vCPU and 8 GB RAM 4 vCPU and 16 GB RAM 8 vCPU and 32 GB RAM 16 vCPU and 64 GB RAM |
0.5 1 2 4 8 |
Foundation model tuning experiment (watsonx only) |
NVIDIA A100 80GB GPU | 43 |
CUH consumption by deployment and framework type
CUH consumption is calculated using these formulas:
Deployment type | Framework | CUH calculation |
---|---|---|
Online | AutoAI, AI function, SPSS, Scikit-Learn custom libraries, Tensorflow, RShiny | deployment_active_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
Online | Spark, PMML, Scikit-Learn, Pytorch, XGBoost | score_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
Batch | all frameworks | job_duration_in_hours * no_of_nodes * CUH_rate_for_capacity_type_framework |
For example, consider a Decision Optimization batch deployment job that runs for 15 minutes. Resource consumption is calculated this way: 15 minutes = 0.25 hours, on 2 nodes, and with 2 vCPU and 8 GB RAM. This combination results in a CUH rate of 30, so every time the job runs it consumes 0.25 * 2 * 30, which equals 15 CUH.
These tables show the capacity units per hour calculation for predefined machine learning environments, by usage type.
Capacity units per hour for training, evaluating, or scoring models
Capacity type | Capacity units per hour |
---|---|
Extra small: 1 vCPU and 4 GB RAM | 0.5 |
Small: 2 vCPU and 8 GB RAM | 1 |
Medium: 4 vCPU and 16 GB RAM | 2 |
Large: 8 vCPU and 32 GB RAM | 4 |
Extra large: 16 vCPU and 64 GB RAM | 8 |
Capacity units per hour for AutoAI experiments
Capacity type | Capacity units per hour |
---|---|
8 vCPU and 32 GB RAM | 20 |
Capacity units per hour for Decision Optimization experiments
These plans apply to Decision Optimization experiments run in watsonx.ai Studio.
Capacity type | Capacity units per hour |
---|---|
Decision Optimization: 2 vCPU and 8 GB RAM | 6 |
Decision Optimization: 4 vCPU and 16 GB RAM | 7 |
Decision Optimization: 8 vCPU and 32 GB RAM | 9 |
Decision Optimization: 16 vCPU and 64 GB RAM | 13 |
Capacity units per hour for Decision Optimization in watsonx.ai Runtime
These plans apply to Decision Optimization deployed and run from watsonx.ai Runtime.
Capacity type | Capacity units per hour |
---|---|
Decision Optimization: 2 vCPU and 8 GB RAM | 30 |
Decision Optimization: 4 vCPU and 16 GB RAM | 40 |
Decision Optimization: 8 vCPU and 32 GB RAM | 50 |
Decision Optimization: 16 vCPU and 64 GB RAM | 60 |
Monitoring resource usage
You can track resource usage for assets you own or collaborate on in a project or space. If you are an account owner or administrator, you can track CUH for an entire account. For more information, see Monitoring account resource usage.
You can track the runtime usage for an account on the Environment Runtimes page if you are the IBM Cloud account owner or administrator or the watsonx.ai Runtime service owner. For more information, see Monitoring resources.
Tracking CUH consumption for machine learning in a notebook
To calculate capacity unit hours in a notebook, use:
CP = client.service_instance.get_details()
CUH = CUH["entity"]["usage"]["capacity_units"]["current"]/(3600*1000)
print(CUH)
For example:
'capacity_units': {'current': 19773430}
19773430/(3600*1000)
returns 5.49 CUH
For details, see the Service Instances section of the IBM watsonx.ai Runtime API documentation.
Learn more
Parent topic: watsonx.ai Runtime plans