Building an AutoAI model
AutoAI automatically prepares data, applies algorithms, and builds model pipelines that are best suited for your data and use case. Learn how to generate the model pipelines that you can save as machine learning models.
Follow these steps to upload data and have AutoAI create the best model for your data and use case.
- Collect your input data
- Open the AutoAI tool
- Specify details of your model and training data and start AutoAI
- View the results
Collect your input data
Collect and prepare your training data. For details on allowable data sources, see AutoAI overview.
If you are creating an experiment with a single training data source, you have the option of using a second data source specifically as testing, or holdout, data for validating the pipelines.
Open the AutoAI tool
For your convenience, your AutoAI model creation uses the default storage that is associated with your project to store your data and to save model results.
-
Open your project.
-
Click the Assets tab.
-
Click New asset > Build machine learning models automatically.
Specify details of your experiment
-
Specify a name and description for your experiment.
-
Select a machine learning service instance and provide task credentials if prompted. Then click Create.
-
Choose data from your project or upload it from your file system or from the asset browser, then press Continue. Click the preview icon to review your data. (Optional) Add a second file as holdout data for testing the trained pipelines.
-
Choose the Column to predict for the data you want the experiment to predict.
-
Based on analyzing a subset of the data set, AutoAI selects a default model type: binary classification, multiclass classification, or regression. Binary is selected if the target column has two possible values. Multiclass has a discrete set of 3 or more values. Regression has a continuous numeric variable in the target column. You can optionally override this selection.
Note: The limit on values to classify is 200. Creating a classification experiment with many unique values in the prediction column is resource-intensive and affects the experiment's performance and training time. To maintain the quality of the experiment:
- AutoAI chooses a default metric for optimizing. For example, the default metric for a binary classification model is *Accuracy*.
- By default, 10% of the training data is held out to test the performance of the model.
-
-
(Optional): Click Experiment settings to view or customize options for your AutoAI run. For details on experiment settings, see Configuring a classification or regression experiment.
-
Click Run Experiment to begin model pipeline creation.
An infographic shows you the creation of pipelines for your data. The duration of this phase depends on the size of your data set. A notification message informs you if the processing time will be brief or require more time. You can work in other parts of the product while the pipelines build.
Hover over nodes in the infographic to explore the factors that pipelines share and their unique properties. You can see the factors that pipelines share and the properties that make a pipeline unique. For a guide to the data in the infographic, click the Legend tab in the information panel. Or, to see a different view of the pipeline creation, click the Experiment details tab of the notification pane, then click Switch views to view the progress map. In either view, click a pipeline node to view the associated pipeline in the leaderboard.
View the results
When the pipeline generation process completes, you can view the ranked model candidates and evaluate them before you save a pipeline as a model.
Next steps
-
Watch this video to see how to build a binary classification model
This video provides a visual method to learn the concepts and tasks in this documentation.
-
Watch this video to see how to build a multiclass classification model
This video provides a visual method to learn the concepts and tasks in this documentation.
Parent topic: AutoAI overview