Tuning a foundation model programmatically
You can programmatically tune a set of foundation models in watsonx.ai to customize then for your use case.
Ways to develop
You can tune foundation models by using these programming methods:
Alternatively, you can use graphical tools from the watsonx.ai UI to tune foundation models. See Tuning Studio.
REST API
Prompt tuning is deprecated and will be removed in the future.
Prompt tuning a foundation model by using the API is a complex task. The sample Python notebooks simplify the process. You can use a sample notebook as a template for writing your own notebooks for prompt tuning. See Tuning a foundation model programmatically.
Supported foundation models
See Choosing a foundation model to tune.
To get a list of foundation models that support prompt tuning programmatically, you can use the following request:
curl -X GET \
'https://{hostname}/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_prompt_tune_trainable'
Procedure
At a high level, prompt tuning a foundation model by using the API involves the following steps:
-
Create a task credential.
A task credential is an API key that is used to authenticate long-running jobs that are started by steps that you will take in this procedure. You do not need to pass the task credential in the API request. A task credential that is created by you must exist in the credentials service for your
user_id
andaccount_id
. See Creating-task-credentials. -
Create a training data file to use for tuning the foundation model.
For more information about the training data file requirements, see Data formats for tuning foundation models.
-
Upload your training data file.
You can choose to add the file by creating one of the following asset types:
-
Connection asset
Note: Only a Cloud Object Storage connection type is supported for prompt tuning training currently.
See Referencing files from the API.
You will use the connection ID and training data file details when you add the
training_data_references
section to therequest.json
file that you create in the next step. -
Data asset
To create a data asset, use the Data and AI Common Core API to define a data asset.
You will use the asset ID and training data file details when you add the
training_data_references
section of the REST request that you create in the next step.
For more information about the supported ways to reference a training data file, see Data references.
-
-
Use the watsonx.ai API to create a training experiment.
See create a training.
You can specify parameters for the experiment in the
TrainingResource
payload. For more information about available parameters, see Parameters for tuning foundation models.For the
task_id
, specify one of the tasks that are listed as being supported for the foundation model in the response to the List the available foundation models method. -
Save the tuned model to the repository service to generate an
asset_id
that points to the tuned model.To save the tuned model, use the watsonx.ai Runtime (formerly Watson Machine Learning) API to create a new model.
-
Use the watsonx.ai API to create a deployment for the tuned model.
To inference a tuned model, you must use the inference endpoint that includes the unique ID of the deployment that hosts the tuned model. For more information, see the inference methods in the Deployments section.
Fine tuning a foundation model by using the REST API
You can use the watsonx.ai REST API to fine tune a foundation model with the following techniques:
- Low-rank adaptation fine tuning
Supported foundation models
See Choosing a foundation model to tune.
To get a list of foundation models that support low-rank adaptation (LoRA) fine tuning programmatically, you can use the following request:
curl -X GET \
'https://{region}.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-02-20&filters=function_lora_fine_tune_trainable'
You can use LoRA only with non-quantized models.
Procedure
The high-level steps that you follow are mostly the same for each technique. The key differences are the values to include in the request body for the fine-tuning training job and are highlighted in this procedure.
-
Create a training data file to use for tuning the foundation model.
For more information about the training data file requirements, see Data formats for tuning foundation models.
-
Make your training data file available for the API to use.
You can do one of the following things:
- UI method
To upload your .json or .jsonl file, follow the steps in Adding files to reference from the API.
-
API method
Create a data asset by using the Data and AI Common Core API to define a data asset.
You will use the asset ID and training data file details when you add the
training_data_references
section of the request body that you create in the next step. -
Use the watsonx.ai API to create a training experiment.
See create a training.
Submit the POST request to this endpoint:
curl --request POST 'https://{region}.ml.cloud.ibm.com/ml/v1/fine_tunings?version=2025-02-14'
Customize the experiment by changing values for parameters in the
TrainingResource
payload. For more information, see these resources:- Supported foundation models, see Choosing a model to tune.
- Changeable parameters, see Parameters for tuning foundation models.
Set
auto_update_model
totrue
to save the generated output as an asset that you can use when you deploy the tuned foundation model later. Otherwise, you must save the tuned model or adapters that are generated by the experiment to the repository service to generate anasset_id
before you can use them in the deployment.The following sample request body creates a LoRA fine-tuning experiment.
{ "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58", "name": "my LoRA experiment", "auto_update_model": true, "tuned_model_name": "my-lora-tuned-model", "parameters": { "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "task_id": "classification", "num_epochs": 10, "learning_rate": 0.00001, "batch_size": 5, "max_seq_length": 4096, "accumulate_steps": 1, "gpu": { "num": 1 }, "peft_parameters": { "type": "lora", "rank": 8, "lora_alpha": 32, "lora_dropout": 0.05, "target_modules": ["all-linear"] } }, "results_reference": { "location": { "path": "fine_tuning/results" }, "type": "container" }, "training_data_references": [ { "location": { "href":"/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956project_id=4e34d515-c61f-4f18-92b4-758be78d0a58", "id":"1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ] }
The output of the request looks something like this:
{ "entity": { "auto_update_model": true, "parameters": { "accumulate_steps": 1, "base_model": { "model_id": "ibm/granite-3-1-8b-base" }, "batch_size": 5, "gpu": { "num": 4 }, "learning_rate": 0.00001, "max_seq_length": 1024, "num_epochs": 10, "peft_parameters": { "lora_alpha": 32, "lora_dropout": 0.05, "rank": 8, "target_modules": [ "all-linear" ], "type": "lora" }, "response_template": "\n### Response:", "task_id": "classification", "verbalizer": "### Input: \n\n### Response: " }, "results_reference": { "location": { "path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results", "notebooks_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/notebooks", "training": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1", "training_status": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/training-status.json", "assets_path": "/projects/4e34d515-c61f-4f18-92b4-758be78d0a58/assets/fine_tuning/results/2491b2d9-bf96-4d3f-9ea7-8604861471e1/assets" }, "type": "container" }, "status": { "state": "pending" }, "training_data_references": [ { "location": { "href": "/v2/assets/1e6591a2-c69d-4716-92e3-73e8c2270956?project_id=4e34d515-c61f-4f18-92b4-758be78d0a58", "id": "1e6591a2-c69d-4716-92e3-73e8c2270956" }, "type": "data_asset" } ], "tuned_model": { "name": "my-lora-tuned-model-2491b2d9-bf96-4d3f-9ea7-8604861471e1" } }, "metadata": { "created_at": "2025-02-14T19:47:36.629Z", "id": "2491b2d9-bf96-4d3f-9ea7-8604861471e1", "modified_at": "2025-02-14T19:47:36.629Z", "name": "My LoRA experiment", "project_id": "4e34d515-c61f-4f18-92b4-758be78d0a58" } }
-
To check the status of a training job, you can use the following request.
Use the
metadata.id
that is returned in the POST request to include as the value of theID
path parameter in the request.curl --request GET 'https://{region}.ml.cloud.ibm.com/ml/v1/fine_tunings/2491b2d9-bf96-4d3f-9ea7-8604861471e1?project_id=4e34d515-c61f-4f18-92b4-758be78d0a58&version=2025-02-14'
For the API reference, see Get fine tuning job.
The tuning experiment is finished when the state is
completed
.If you included
"auto_update_model": true
in the request, then the model asset ID of the tuned model or adapter will be listed in theentity.tuned_model.id
field of the response from the GET request. Make a note of the model asset ID. -
Use the watsonx.ai API to deploy your tuned model.
To deploy your tuned model, you must complete the appropriate steps for the tuning method used.
-
Low-rank adaptation: Complete the following tasks:
-
Create a base foundation model asset.
The model asset defines metadata for the foundation model that will be used as the base model. See Creating the model asset.
-
Deploy the base foundation model.
You need a dedicated instance of the base foundation model that can be used at inference time. See Deploying the base model.
-
Deploy the low-rank adapter asset that was generated by the tuning experiment.
Deploy adapters that can adjust the base model weights at inference time to customize the output for the task. See Deploying the LoRA adapter model asset.
-
-
Full fine tuning: See Deploying fine-tuned models.
-
-
Inference the tuned foundation model.
To inference a tuned model, use an inference endpoint that includes the unique ID of the deployment that hosts the tuned model.
- Low-rank adaptation: See Inferencing deployed PEFT models.
- Full fine tuning: See Inferencing the deployed model.
Parent topic: Tuning foundation models