At scale, building this continuous evaluation system can be a challenging task. There are several factors that contribute to making it difficult including getting access to production data, provisioning computational resources, standardizing the model evaluation process, and guaranteeing its reproducibility. With the intent to simplify and accelerate the entire process of defining and running ML evaluations, Vertex AI Model Evaluation enables you to iteratively assess and compare model performance at scale.
With Vertex AI Model Evaluation, you define a test dataset, a model, and an evaluation configuration as inputs and it will return model performance metrics whether you are training your model using your notebook, running a training job, or an ML pipeline on Vertex AI.
Vertex AI Model Evaluation is integrated with the following products:
- Vertex AI Model Registry which provides a new view to get access to different evaluation jobs and the resulting metrics they produce after the model training job completes.
- Model Builder SDK which introduces a new evaluate method to get classification, regression, and forecasting metrics for a model trained locally.
- Managed Pipelines with a new evaluation component to generate and visualize metrics results within the Vertex AI Pipelines Console.
Now that you know the new features of Vertex AI Model Evaluation, let’s see how you can leverage them to improve your model quality at scale.
Evaluate performances of different models in Vertex AI Model Registry
As the decision maker who has to promote the model to production, you need to govern the model launching process.
To release the model, you need to easily retrieve, visualize, and compare the offline and online performance and explainability metrics of the trained models.
Thanks to the integration between Vertex AI Model Registry and Vertex AI Model evaluation, you can now view all historical evaluations of each model (BQML, AutoML and custom models). For each model version, the Vertex AI Model Registry console shows classification, regression, and forecasting metrics depending on the type of model.