Parameters for tuning foundation models

Last updated: Mar 04, 2025

Tuning parameters configure the tuning experiments that you use to tune a foundation model.

Note: The parameters that you change when you tune a foundation model apply to the tuning experiment, not to the underlying foundation model.

Prompt tuning parameters

The following table describes the tuning parameters that you can customize.

Tuning parameter value description references
Parameter name	Description	Value options	Learn more
Initialization method	Specifies how to initialize the prompt vector.	Random, Text	Initializing prompt tuning
Initialization text	Text to use as the prompt for the first run of the experiment.	–	Initializing prompt tuning
Batch size	Number of labeled examples to process at one time.	1–16	Segmenting the training data
Accumulate steps	Number of batches to process before adjustments are made.	1–128	Segmenting the training data
Learning rate	Determines the scope of the change to make when the model is adjusted.	0.00001–0.5	Managing the learning rate
Number of epochs (number of training cycles)	Number of times to cycle through the training data.	1–50	Choosing the number of training cycles to complete

Setting parameter values for prompt tuning

The best hyperparameter values to use for a prompt-tuning experiment differ based on your data and use case.

The following table captures the parameter values to use as a starting point for prompt tuning a third-party foundation model.

Tuning parameter values for third-party foundation models
Parameter name	Default value for flan-t5-xl-3b
Initialization method	Random
Initialization text	None
Batch size	16
Accumulate steps	16
Learning rate	0.3
Number of epochs (number of training cycles)	20

The default parameters that are used for prompt tuning the granite-13b-instruct-v2 foundation model are adjusted based on the type of task you want the tuned model to do.

The following table captures the parameter values to use as a starting point per supported task type for prompt tuning the granite-13b-instruct-v2 foundation model.

Tuning parameter values for the granite-13b-instruct-v2 foundation model
Parameter name	Default value for classification	Default value for generation	Default value for summarization
Batch size	8	16	8
Accumulate steps	32	16	1
Learning rate	0.0006	0.0002	0.0002
Number of epochs (number of training cycles)	20	20	40

Parameter descriptions

Segmenting the training data

When an experiment runs, the experiment first breaks the training data into smaller batches, and then trains on one batch at a time. Each batch must fit in GPU memory to be processed. To reduce the amount of GPU memory that is needed, you can configure the tuning experiment to postpone making adjustments until more than one batch is processed. Tuning runs on a batch and its performance metrics are calculated, but no adjustments are made immediately. Instead, the performance information is collected over some number of batches before the cumulative performance metrics are evaluated.

Use the following parameters to control how the training data is segmented:

Batch size Number of labeled examples (also known as samples) to process at one time.

For example, for a data set with 1,000 examples and a batch size of 10, the data set is divided into 100 batches of 10 examples each.

If the training data set is small, specify a smaller batch size to ensure that each batch has enough examples in it.

Accumulation steps: Number of batches to process before adjustments are made.

For example, if the data set is divided into 100 batches and you set the accumulation steps value to 10, then adjustments are made 10 times instead of 100 times.

Choosing the number of training cycles to complete

The Number of epochs parameter specifies the number of times to cycle through the complete training dataset.

For example, with a batch size of 10 and a data set with 1,000 examples, one epoch must process 100 batches and make adjustments 100 times. If you set the number of epochs to 20, the model is passed through the data set 20 times, which means it processes a total of 2,000 batches during the tuning process.

The higher the number of epochs and bigger your training data, the longer it takes to tune a model. If you set the number of epochs too low, the model might not learn adequately. If you set the number of epochs too high, you can overfit the model to the data set. Overfitting is a term used to describe the phenomena where a model is so closely tuned to its training data that it cannot generalize and apply what it learns when new data is introduced.

Managing the learning rate

The learning rate parameter determines the scope of the change to make when the model is adjusted. The higher the number, the greater the change. Setting the learning rate too low might prevent the model from learning adequately from the new data presented. Setting the learning rate too high might prevent the model from learning gradually enough to be able to apply what it learns to new, unseen data.

This parameter is one that you might want to set conservatively, and then change gradually as you experiment to find the best hyperparameters for the dataset and foundation model that you are customizing.

Setting token limits

You can change the number of tokens that are allowed in the model input and output during a tuning experiment by setting the max_seq_length parameter. The maximum sequence length is the maximum number of input tokens plus the output tokens allowed for each prompt.

The larger the number of allowed input and output tokens, the longer it takes to tune the model. Set this parameter to the smallest number that is possible to use but still represent your use case properly.

Create input and output examples in your training data that conform to the limit you plan to use for tuning. Examples that are longer than the specified maximum sequence length are truncated during the experiment. For example, if you set this parameter to 200 and the training data has an example input with 1,000 tokens, only the first 200 tokens of the example input are used.

Remember, the sequence length also includes the output tokens for each prompt, which means the setting controls the number of tokens that the model is allowed to generate as output during the tuning experiment.

Initializing the prompt

When you create a prompt-tuning experiment, you can choose whether to specify your own text to serve as the initial prompt vector or let the experiment generate it for you. These new tokens start the training process either in random positions, or based on the embedding of a vocabulary or instruction that you specify in text. Studies show that as the size of the underlying model grows beyond 10 billion parameters, the initialization method that is used becomes less important.

The choice that you make when you create the tuning experiment customizes how the prompt is initialized.

Initialization method: Choose a method from the following options:

Text: The Prompt Tuning method is used where you specify the initialization text of the prompt yourself.
Random: The Prompt Tuning method is used that allows the experiment to add values that are chosen at random to include with the prompt.

Initialization text: The text that you want to add. Specify a task description or instructions similar to what you use for zero-shot prompting.

Learn more

Data formats

Parent topic: Tuning a model

Was the topic helpful?

0/1000