Deploying foundation models tuned with PEFT techniques
You can deploy base foundation models that are hosted by IBM and trained with parameter-efficient fine tuning (PEFT) techniques like low-rank adaptation (LoRA) fine tuning.
Before you deploy your fine-tuned model, you must train your model to adjust the model's parameters based on the new dataset. This enables the model to adapt and specialize for the target domain. Fine tuning the model with the LoRA technique creates an adapter, which is used for deployment.
After fine tuning the model, deploy the model in the production environment to get an endpoint that your application can use to inference the model. To deploy the LoRA adapter, start by deploying the base foundation model asset.
Use the deployed model to make predictions or generate text by providing input data to leverage the knowledge and adaptability gained during fine-tuning.
Procedure
To deploy your fine-tune trained with PEFT technniques, you must follow a programmatic approach to deploy with watsonx.ai REST API:
- Review requirements: Review supported architectures and hardware and software requirements to deploy fine-tuned models that are trained with a PEFT technique by using watsonx.ai.
- Optional: Create a customized hardware specification: If you have a GPU configuration that is different from the predefined GPU configurations available for deployment, create a customized hardware specification.
- Deploy the PEFT model: To deploy your PEFT model with watsonx.ai REST API, follow these steps:
- Create repository asset for base foundation model: Create a repository asset for the base foundation model that you want to fine-tune with a PEFT technique.
- Deploy base foundation model: Create an online deployment for your base foundation model for a supported architecture.
- Optional: Create repository asset for LoRA model asset: If
auto_update_model
option was not enabled during training, create a repository asset for the LoRA adapters. For more information, see Tuning a foundation model. - Deploy the respository asset for LoRA or QLoRA model: Deploy the trained model with REST API.
- Create repository asset for base foundation model: Create a repository asset for the base foundation model that you want to fine-tune with a PEFT technique.
Parent topic: Deploying tuned models