Applicable only to custom training with Datasets that have DataItems and
Annotations.
Cloud Storage URI that points to a YAML file describing the annotation
schema. The schema is defined as an OpenAPI 3.0.2 Schema
Object.
The schema files that can be used here are found in
gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the
chosen schema must be consistent with
metadata of the Dataset specified by
dataset_id.
Only Annotations that both match this schema and belong to DataItems not
ignored by the split method are used in respectively training, validation
or test role, depending on the role of the DataItem they are on.
When used in conjunction with annotations_filter, the Annotations used
for training are filtered by both annotations_filter and
annotation_schema_uri.
Applicable only to custom training with Datasets that have DataItems and
Annotations.
Cloud Storage URI that points to a YAML file describing the annotation
schema. The schema is defined as an OpenAPI 3.0.2 Schema
Object.
The schema files that can be used here are found in
gs://google-cloud-aiplatform/schema/dataset/annotation/ , note that the
chosen schema must be consistent with
metadata of the Dataset specified by
dataset_id.
Only Annotations that both match this schema and belong to DataItems not
ignored by the split method are used in respectively training, validation
or test role, depending on the role of the DataItem they are on.
When used in conjunction with annotations_filter, the Annotations used
for training are filtered by both annotations_filter and
annotation_schema_uri.
Applicable only to Datasets that have DataItems and Annotations.
A filter on Annotations of the Dataset. Only Annotations that both
match this filter and belong to DataItems not ignored by the split method
are used in respectively training, validation or test role, depending on
the role of the DataItem they are on (for the auto-assigned that role is
decided by Vertex AI). A filter with same syntax as the one used in
ListAnnotations may be used, but note
here it filters across all Annotations of the Dataset, and not just within
a single DataItem.
Applicable only to Datasets that have DataItems and Annotations.
A filter on Annotations of the Dataset. Only Annotations that both
match this filter and belong to DataItems not ignored by the split method
are used in respectively training, validation or test role, depending on
the role of the DataItem they are on (for the auto-assigned that role is
decided by Vertex AI). A filter with same syntax as the one used in
ListAnnotations may be used, but note
here it filters across all Annotations of the Dataset, and not just within
a single DataItem.
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training, validation and test.
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training, validation and test.
Required. The ID of the Dataset in the same Project and Location which data will be
used to train the Model. The Dataset must use schema compatible with
Model being trained, and what is compatible should be described in the
used TrainingPipeline's [training_task_definition]
[google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
For tabular Datasets, all their data is exported to training, to pick
and choose from.
Required. The ID of the Dataset in the same Project and Location which data will be
used to train the Model. The Dataset must use schema compatible with
Model being trained, and what is compatible should be described in the
used TrainingPipeline's [training_task_definition]
[google.cloud.aiplatform.v1.TrainingPipeline.training_task_definition].
For tabular Datasets, all their data is exported to training, to pick
and choose from.
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset---
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset---
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
Only applicable to Datasets that have SavedQueries.
The ID of a SavedQuery (annotation set) under the Dataset specified by
dataset_id used for filtering Annotations for training.
Only Annotations that are associated with this SavedQuery are used in
respectively training. When used in conjunction with
annotations_filter, the Annotations used for training are filtered by
both saved_query_id and annotations_filter.
Only one of saved_query_id and annotation_schema_uri should be
specified as both of them represent the same thing: problem type.
Only applicable to Datasets that have SavedQueries.
The ID of a SavedQuery (annotation set) under the Dataset specified by
dataset_id used for filtering Annotations for training.
Only Annotations that are associated with this SavedQuery are used in
respectively training. When used in conjunction with
annotations_filter, the Annotations used for training are filtered by
both saved_query_id and annotations_filter.
Only one of saved_query_id and annotation_schema_uri should be
specified as both of them represent the same thing: problem type.
Only applicable to custom training with tabular Dataset with BigQuery
source.
The BigQuery project location where the training data is to be written
to. In the given project a new dataset is created with name
dataset_
where timestamp is in YYYY_MM_DDThh_mm_ss_sssZ format. All training
input data is written into that dataset. In the dataset three
tables are created, training, validation and test.
The Cloud Storage location where the training data is to be
written to. In the given directory a new directory is created with
name:
dataset---
where timestamp is in YYYY-MM-DDThh:mm:ss.sssZ ISO-8601 format.
All training input data is written into that directory.
The Vertex AI environment variables representing Cloud Storage
data URIs are represented in the Cloud Storage wildcard
format to support sharded data. e.g.: "gs://.../training-*.jsonl"
AIP_DATA_FORMAT = "jsonl" for non-tabular data, "csv" for tabular data
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-06-12 UTC."],[],[]]