開始使用 OpenTelemetry 收集器

本文說明如何設定 OpenTelemetry Collector,以便擷取標準 Prometheus 指標,並將這些指標回報至 Google Cloud Managed Service for Prometheus。OpenTelemetry Collector 是一種代理程式,您可以自行部署並設定匯出至 Managed Service for Prometheus。設定方式與使用自行部署的集合執行 Managed Service for Prometheus 相似。

您可能會選擇 OpenTelemetry Collector,而非自行部署的收集器,原因如下:

  • 透過 OpenTelemetry Collector,您可以在管道中設定不同的匯出工具,將遙測資料路由至多個後端。
  • 收集器也支援指標、記錄和追蹤記錄的信號,因此您可以使用收集器在單一代理程式中處理所有三種信號類型。
  • OpenTelemetry 的供應商通用資料格式 (OpenTelemetry 通訊協定,或稱 OTLP) 支援強大的程式庫和可插入的收集器元件生態系統。這可讓您選擇多種自訂選項,用於接收、處理及匯出資料。

不過,執行 OpenTelemetry 收集器需要自行管理部署和維護方式,因此這些優點必須有所取捨。您選擇的方法取決於您的具體需求,但在本文件中,我們會提供建議的規範,說明如何使用 Managed Service for Prometheus 做為後端,設定 OpenTelemetry Collector。

事前準備

本節說明本文件中所述工作所需的設定。

設定專案和工具

如要使用 Google Cloud Managed Service for Prometheus,您需要下列資源:

  • 已啟用 Cloud Monitoring API 的 Google Cloud 專案。

    • 如果您沒有 Google Cloud 專案,請執行下列操作:

      1. 在 Google Cloud 控制台中,前往「New Project」(新增專案)

        建立新專案

      2. 在「Project Name」欄位中輸入專案名稱,然後按一下「Create」

      3. 前往「帳單」

        前往「帳單」頁面

      4. 如果頁面頂端沒有顯示已選取的專案,請選取您剛建立的專案。

      5. 系統會提示您選擇現有的付款資料或是建立新的付款資料。

      新專案預設會啟用 Monitoring API。

    • 如果您已擁有 Google Cloud 專案,請確認 Monitoring API 已啟用:

      1. 前往「API 和服務」

        前往「API 和服務」

      2. 選取專案。

      3. 點選「啟用 API 和服務」

      4. 搜尋「Monitoring」。

      5. 在搜尋結果中,點選「Cloud Monitoring API」。

      6. 如果畫面未顯示「API enabled」,請按一下「Enable」按鈕。

  • Kubernetes 叢集。如果您沒有 Kubernetes 叢集,請按照 GKE 快速入門中的操作說明進行。

您也需要下列指令列工具:

  • gcloud
  • kubectl

gcloudkubectl 工具是 Google Cloud CLI 的一部分。如要瞭解如何安裝這些元件,請參閱「管理 Google Cloud CLI 元件」。如要查看已安裝的 gcloud CLI 元件,請執行下列指令:

gcloud components list

設定環境

為避免重複輸入專案 ID 或叢集名稱,請執行下列設定:

  • 請按照下列方式設定指令列工具:

    • 設定 gcloud CLI 以參照Google Cloud 專案的 ID:

      gcloud config set project PROJECT_ID
      
    • 設定 kubectl CLI 以使用叢集:

      kubectl config set-cluster CLUSTER_NAME
      

    如要進一步瞭解這些工具,請參閱下列資源:

設定命名空間

為範例應用程式中建立的資源建立 NAMESPACE_NAME Kubernetes 命名空間:

kubectl create ns NAMESPACE_NAME

驗證服務帳戶憑證

如果 Kubernetes 叢集已啟用 Workload Identity Federation for GKE,您可以略過本節。

在 GKE 上執行時,Managed Service for Prometheus 會根據 Compute Engine 預設服務帳戶,自動從環境中擷取憑證。預設服務帳戶具備必要權限 monitoring.metricWritermonitoring.viewer。如果您未為 GKE 使用 Workload Identity Federation,且先前已從預設節點服務帳戶移除其中任一角色,則必須重新加入缺少的權限,才能繼續操作。

設定 Workload Identity Federation for GKE 的服務帳戶

如果 Kubernetes 叢集未啟用 Workload Identity Federation for GKE,您可以略過本節。

Managed Service for Prometheus 會使用 Cloud Monitoring API 擷取指標資料。如果叢集使用 GKE 適用的工作負載身分聯盟,您必須將 Monitoring API 的權限授予 Kubernetes 服務帳戶。本節說明下列事項:

建立及繫結服務帳戶

這個步驟會出現在 Managed Service for Prometheus 說明文件的多個位置。如果您已在先前的任務中執行此步驟,則不必重複執行。請跳到「授權服務帳戶」一節。

下列指令序列會建立 gmp-test-sa 服務帳戶,並將其繫結至 NAMESPACE_NAME 命名空間中的預設 Kubernetes 服務帳戶:

gcloud config set project PROJECT_ID \
&&
gcloud iam service-accounts create gmp-test-sa \
&&
gcloud iam service-accounts add-iam-policy-binding \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[NAMESPACE_NAME/default]" \
  gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
&&
kubectl annotate serviceaccount \
  --namespace NAMESPACE_NAME \
  default \
  iam.gke.io/gcp-service-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com

如果您使用的是其他 GKE 命名空間或服務帳戶,請視情況調整指令。

授權給服務帳戶

相關權限的群組會收集到角色中,您可以將角色授予主體,在本例中為 Google Cloud服務帳戶。如要進一步瞭解監控角色,請參閱「存取權控管」。

下列指令會授予 Google Cloud 服務帳戶 gmp-test-sa 所需的 Monitoring API 角色,以便寫入指標資料。

如果您已在先前的任務中授予 Google Cloud 服務帳戶特定角色,則不必再次執行。

gcloud projects add-iam-policy-binding PROJECT_ID\
  --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
  --role=roles/monitoring.metricWriter

對 GKE 適用的工作負載身分聯盟設定進行偵錯

如果您無法順利使用 Workload Identity Federation for GKE,請參閱相關文件,瞭解如何驗證 Workload Identity Federation for GKE 設定,以及Workload Identity Federation for GKE 疑難排解指南

由於輸入錯誤和部分複製貼上作業是設定 GKE 工作負載身分識別聯邦的常見錯誤來源,因此我們強烈建議您使用這些操作說明中程式碼範例內建的可編輯變數和可點選的複製貼上圖示。

在實際工作環境中使用 Workload Identity Federation for GKE

本文所述範例會將 Google Cloud 服務帳戶繫結至預設 Kubernetes 服務帳戶,並授予 Google Cloud服務帳戶使用 Monitoring API 所需的所有權限。

在正式環境中,您可能會想採用更精細的方法,為每個元件建立服務帳戶,並為每個服務帳戶授予最基本的權限。如要進一步瞭解如何設定服務帳戶以管理工作負載身分,請參閱「使用 GKE 適用的工作負載身分聯盟」。

設定 OpenTelemetry 收集器

本節將逐步引導您設定及使用 OpenTelemetry Collector,從範例應用程式擷取指標,並將資料傳送至 Google Cloud Managed Service for Prometheus。如需詳細設定資訊,請參閱以下各節:

OpenTelemetry Collector 類似於 Managed Service for Prometheus 代理程式二進位檔。OpenTelemetry 社群會定期發布版本,包括原始碼、二進位檔和容器映像檔。

您可以使用最佳做法預設值,在 VM 或 Kubernetes 叢集中部署這些構件,也可以使用收集器建構工具,自行建構收集器,只包含所需的元件。如要建構收集器,以便與 Managed Service for Prometheus 搭配使用,您需要下列元件:

  • Managed Service for Prometheus 匯出器,可將指標寫入 Managed Service for Prometheus。
  • 用來擷取指標的接收器。本文假設您使用的是 OpenTelemetry Prometheus 接收器,但 Managed Service for Prometheus 匯出器與任何 OpenTelemetry 指標接收器皆相容。
  • 處理器可依據環境批次處理及標記指標,以便納入重要資源 ID。

您可以使用設定檔啟用這些元件,並透過 --config 標記將其傳遞至收集器。

以下各節將進一步說明如何設定這些元件。本文說明如何在 GKE其他地方執行收集器。

設定及部署收集器

無論您是在 Google Cloud 上還是在其他環境中執行收集作業,都可以設定 OpenTelemetry Collector 將資料匯出至 Managed Service for Prometheus。最大的差異在於收集器的設定方式。在非Google Cloud 環境中,指標資料可能需要額外的格式設定,才能與 Managed Service for Prometheus 相容。不過,在 Google Cloud上,收集器可以自動偵測大部分的格式設定。

在 GKE 上執行 OpenTelemetry Collector

您可以將下列設定複製到名為 config.yaml 的檔案中,在 GKE 上設定 OpenTelemetry Collector:

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'SCRAPE_JOB_NAME'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
          action: keep
          regex: prom-example
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

# Note that the googlemanagedprometheus exporter block is intentionally blank
exporters:
  googlemanagedprometheus:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch, memory_limiter, resourcedetection, transform]
      exporters: [googlemanagedprometheus]

上述設定會使用 Prometheus 接收器Managed Service for Prometheus 匯出器,在 Kubernetes Pod 上抓取指標端點,並將這些指標匯出至 Managed Service for Prometheus。管道處理器會將資料格式化並分批處理。

如要進一步瞭解此設定的各部分功能,以及不同平台的設定,請參閱以下有關擷取指標新增處理器的詳細說明。

將現有的 Prometheus 設定與 OpenTelemetry Collector 的 prometheus 接收器搭配使用時,請將任何 $ 字元替換為 $$ to avoid triggering environment variable substitution. For more information, see Scrape Prometheus metrics.

You can modify this config based on your environment, provider, and the metrics you want to scrape, but the example config is a recommended starting point for running on GKE.

Run the OpenTelemetry Collector outside Google Cloud

Running the OpenTelemetry Collector outside Google Cloud, such as on-premises or on other cloud providers, is similar to running the Collector on GKE. However, the metrics you scrape are less likely to automatically include data that best formats it for Managed Service for Prometheus. Therefore, you must take extra care to configure the collector to format the metrics so they are compatible with Managed Service for Prometheus.

You can the following config into a file called config.yaml to set up the OpenTelemetry Collector for deployment on a non-GKE Kubernetes cluster:

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'SCRAPE_JOB_NAME'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
          action: keep
          regex: prom-example
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

processors:
  resource:
    attributes:
    - key: "cluster"
      value: "CLUSTER_NAME"
      action: upsert
    - key: "namespace"
      value: "NAMESPACE_NAME"
      action: upsert
    - key: "location"
      value: "REGION"
      action: upsert

  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

exporters:
  googlemanagedprometheus:
    project: "PROJECT_ID"

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [batch, memory_limiter, resource, transform]
      exporters: [googlemanagedprometheus]

This config does the following:

When using an existing Prometheus configuration with the OpenTelemetry Collector's prometheus receiver, replace any $ characters with $$,以免觸發環境變數替換作業。詳情請參閱「擷取 Prometheus 指標」。

如要進一步瞭解在其他雲端設定收集器的最佳做法,請參閱 Amazon EKSAzure AKS

部署範例應用程式

範例應用程式會在其 metrics 通訊埠上發出 example_requests_total 計數器指標和 example_random_numbers 直方圖指標 (以及其他指標)。這個範例的資訊清單定義了三個備份。

如要部署範例應用程式,請執行下列指令:

kubectl -n NAMESPACE_NAME apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/prometheus-engine/v0.15.3/examples/example-app.yaml

將收集器設定做為 ConfigMap 建立

建立設定檔並放入名為 config.yaml 的檔案後,請使用該檔案根據 config.yaml 檔案建立 Kubernetes ConfigMap。收集器部署後,會掛載 ConfigMap 並載入檔案。

如要使用設定檔建立名為 otel-config 的 ConfigMap,請使用下列指令:

kubectl -n NAMESPACE_NAME create configmap otel-config --from-file config.yaml

部署收集器

建立名為 collector-deployment.yaml 的檔案,並在其中加入下列內容:

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: NAMESPACE_NAME:prometheus-test
rules:
- apiGroups: [""]
  resources:
  - pods
  verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: NAMESPACE_NAME:prometheus-test
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: NAMESPACE_NAME:prometheus-test
subjects:
- kind: ServiceAccount
  namespace: NAMESPACE_NAME
  name: default
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: otel-collector
spec:
  replicas: 1
  selector:
    matchLabels:
      app: otel-collector
  template:
    metadata:
      labels:
        app: otel-collector
    spec:
      containers:
      - name: otel-collector
        image: otel/opentelemetry-collector-contrib:0.106.0
        args:
        - --config
        - /etc/otel/config.yaml
        - --feature-gates=exporter.googlemanagedprometheus.intToDouble
        volumeMounts:
        - mountPath: /etc/otel/
          name: otel-config
      volumes:
      - name: otel-config
        configMap:
          name: otel-config

請執行下列指令,在 Kubernetes 叢集中建立收集器部署:

kubectl -n NAMESPACE_NAME create -f collector-deployment.yaml

Pod 啟動後,會擷取範例應用程式,並將指標回報至 Managed Service for Prometheus。

如要瞭解查詢資料的方式,請參閱「使用 Cloud Monitoring 進行查詢」或「使用 Grafana 進行查詢」。

明確提供憑證

在 GKE 上執行時,OpenTelemetry Collector 會根據節點的服務帳戶,自動從環境中擷取憑證。在非 GKE Kubernetes 叢集中,您必須使用標記或 GOOGLE_APPLICATION_CREDENTIALS 環境變數,明確向 OpenTelemetry Collector 提供憑證。

  1. 將內容設定為目標專案:

    gcloud config set project PROJECT_ID
    
  2. 建立服務帳戶:

    gcloud iam service-accounts create gmp-test-sa
    

    這個步驟會建立服務帳戶,您可能已在 Workload Identity Federation for GKE 操作說明中建立此帳戶。

  3. 將必要權限授予服務帳戶:

    gcloud projects add-iam-policy-binding PROJECT_ID\
      --member=serviceAccount:gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com \
      --role=roles/monitoring.metricWriter
    

  4. 建立並下載服務帳戶金鑰:

    gcloud iam service-accounts keys create gmp-test-sa-key.json \
      --iam-account=gmp-test-sa@PROJECT_ID.iam.gserviceaccount.com
    
  5. 將金鑰檔案新增為非 GKE 叢集的密鑰:

    kubectl -n NAMESPACE_NAME create secret generic gmp-test-sa \
      --from-file=key.json=gmp-test-sa-key.json
    

  6. 開啟 OpenTelemetry 部署資源進行編輯:

    kubectl -n NAMESPACE_NAME edit deployment otel-collector
    
  1. 將粗體文字新增至資源:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      namespace: NAMESPACE_NAME
      name: otel-collector
    spec:
      template
        spec:
          containers:
          - name: otel-collector
            env:
            - name: "GOOGLE_APPLICATION_CREDENTIALS"
              value: "/gmp/key.json"
    ...
            volumeMounts:
            - name: gmp-sa
              mountPath: /gmp
              readOnly: true
    ...
          volumes:
          - name: gmp-sa
            secret:
              secretName: gmp-test-sa
    ...
    

  2. 儲存檔案並關閉編輯器。套用變更後,系統會重新建立 Pod,並開始使用指定的服務帳戶對指標後端進行驗證。

抓取 Prometheus 指標

本節和後續章節會提供 OpenTelemetry Collector 的其他自訂資訊。這項資訊在某些情況下可能會有所幫助,但執行「設定 OpenTelemetry 收集器」一文中所述範例時,並不需要任何這類資訊。

如果應用程式已公開 Prometheus 端點,OpenTelemetry Collectors 可以使用與任何標準 Prometheus 設定相同的擷取設定格式擷取這些端點。如要這麼做,請在收集器設定中啟用 Prometheus 接收器

Kubernetes 容器的 Prometheus 接收器設定可能如下所示:

receivers:
  prometheus:
    config:
      scrape_configs:
      - job_name: 'kubernetes-pods'
        kubernetes_sd_configs:
        - role: pod
        relabel_configs:
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
          action: replace
          regex: (.+):(?:\d+);(\d+)
          replacement: $$1:$$2
          target_label: __address__
        - action: labelmap
          regex: __meta_kubernetes_pod_label_(.+)

service:
  pipelines:
    metrics:
      receivers: [prometheus]

這是以服務探索為基礎的擷取設定,您可以視需要修改這項設定,擷取應用程式。

將現有的 Prometheus 設定與 OpenTelemetry Collector 的 prometheus 接收器搭配使用時,請將所有 $ 字元替換為 $$ to avoid triggering environment variable substitution. This is especially important to do for the replacement value within your relabel_configs section. For example, if you have the following relabel_config section:

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (.+):(?:\d+);(\d+)
  replacement: $1:$2
  target_label: __address__

Then rewrite it to be:

- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
  action: replace
  regex: (.+):(?:\d+);(\d+)
  replacement: $$1:$$2
  target_label: __address__


For more information, see the OpenTelemetry documentation.

Next, we strongly recommend that you use processors to format your metrics. In many cases, processors must be used to properly format your metrics.

Add processors

OpenTelemetry processors modify telemetry data before it is exported. You can use the following processors to ensure that your metrics are written in a format compatible with Managed Service for Prometheus.

Detect resource attributes

The Managed Service for Prometheus exporter for OpenTelemetry uses the prometheus_target monitored resource to uniquely identify time series data points. The exporter parses the required monitored-resource fields from resource attributes on the metric data points. The fields and the attributes from which the values are scraped are:

  • project_id: auto-detected by Application Default Credentials, gcp.project.id, or project in exporter config (see configuring the exporter)
  • location: location, cloud.availability_zone, cloud.region
  • cluster: cluster, k8s.cluster_name
  • namespace: namespace, k8s.namespace_name
  • job: service.name + service.namespace
  • instance: service.instance.id

Failure to set these labels to unique values can result in "duplicate timeseries" errors when exporting to Managed Service for Prometheus. In many cases, values can be automatically detected for these labels, but in some cases, you might have to map them yourself. The rest of this section describes these scenarios.

The Prometheus receiver automatically sets the service.name attribute based on the job_name in the scrape config, and service.instance.id attribute based on the scrape target's instance. The receiver also sets k8s.namespace.name when using role: pod in the scrape config.

When possible, populate the other attributes automatically by using the resource detection processor. However, depending on your environment, some attributes might not be automatically detectable. In this case, you can use other processors to either manually insert these values or parse them from metric labels. The following sections illustrate configurations for detecting resources on various platforms.

GKE

When running OpenTelemetry on GKE, you need to enable the resource-detection processor to fill out the resource labels. Be sure that your metrics don't already contain any of the reserved resource labels. If this is unavoidable, see Avoid resource attribute collisions by renaming attributes.

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

This section can be copied directly into your config file, replacing the processors section if it already exists.

Amazon EKS

The EKS resource detector does not automatically fill in the cluster or namespace attributes. You can provide these values manually by using the resource processor, as shown in the following example:

processors:
  resourcedetection:
    detectors: [eks]
    timeout: 10s

  resource:
    attributes:
    - key: "cluster"
      value: "my-eks-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert

You can also convert these values from metric labels using the groupbyattrs processor (see move metric labels to resource labels below).

Azure AKS

The AKS resource detector does not automatically fill in the cluster or namespace attributes. You can provide these values manually by using the resource processor, as shown in the following example:

processors:
  resourcedetection:
    detectors: [aks]
    timeout: 10s

  resource:
    attributes:
    - key: "cluster"
      value: "my-eks-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert

You can also convert these values from metric labels by using the groupbyattrs processor; see Move metric labels to resource labels.

On-premises and non-cloud environments

With on-premises or non-cloud environments, you probably can't detect any of the necessary resource attributes automatically. In this case, you can emit these labels in your metrics and move them to resource attributes (see Move metric labels to resource labels), or manually set all of the resource attributes as shown in the following example:

processors:
  resource:
    attributes:
    - key: "cluster"
      value: "my-on-prem-cluster"
      action: upsert
    - key: "namespace"
      value: "my-app"
      action: upsert
    - key: "location"
      value: "us-east-1"
      action: upsert

Create your collector config as a ConfigMap describes how to use the config. That section assumes you have put your config in a file called config.yaml.

The project_id resource attribute can still be automatically set when running the Collector with Application Default Credentials. If your Collector does not have access to Application Default Credentials, see Setting project_id.

Alternatively, you can manually set the resource attributes you need in an environment variable, OTEL_RESOURCE_ATTRIBUTES, with a comma-separated list of key-value pairs, for example:

export OTEL_RESOURCE_ATTRIBUTES="cluster=my-cluster,namespace=my-app,location=us-east-1"

Then use the env resource detector processor to set the resource attributes:

processors:
  resourcedetection:
    detectors: [env]

Avoid resource attribute collisions by renaming attributes

If your metrics already contain labels that collide with the required resource attributes (such as location, cluster, or namespace), rename them to avoid the collision. The Prometheus convention is to add the prefix exported_ to the label name. To add this prefix, use the transform processor.

The following processors config renames any potential collisions and resolves any conflicting keys from the metric:

processors:
  transform:
    # "location", "cluster", "namespace", "job", "instance", and "project_id" are reserved, and
    # metrics containing these labels will be rejected.  Prefix them with exported_ to prevent this.
    metric_statements:
    - context: datapoint
      statements:
      - set(attributes["exported_location"], attributes["location"])
      - delete_key(attributes, "location")
      - set(attributes["exported_cluster"], attributes["cluster"])
      - delete_key(attributes, "cluster")
      - set(attributes["exported_namespace"], attributes["namespace"])
      - delete_key(attributes, "namespace")
      - set(attributes["exported_job"], attributes["job"])
      - delete_key(attributes, "job")
      - set(attributes["exported_instance"], attributes["instance"])
      - delete_key(attributes, "instance")
      - set(attributes["exported_project_id"], attributes["project_id"])
      - delete_key(attributes, "project_id")

Move metric labels to resource labels

In some cases, your metrics might be intentionally reporting labels such as namespace because your exporter is monitoring multiple namespaces. For example, when running the kube-state-metrics exporter.

In this scenario, these labels can be moved to resource attributes using the groupbyattrs processor:

processors:
  groupbyattrs:
    keys:
    - namespace
    - cluster
    - location

In the previous example, given a metric with the labels namespace, cluster, or location, those labels will be converted to the matching resource attributes.

Limit API requests and memory usage

Two other processors, the batch processor and memory limiter processor allow you to limit the resource consumption of your collector.

Batch processing

Batching requests lets you define how many data points to send in a single request. Note that Cloud Monitoring limit of 200 time series per request. Enable the batch processor by using the following settings:

processors:
  batch:
    # batch metrics before sending to reduce API usage
    send_batch_max_size: 200
    send_batch_size: 200
    timeout: 5s

Memory limiting

We recommend enabling the memory-limiter processor to prevent your collector from crashing at times of high throughput. Enable the processing by using the following settings:

processors:
  memory_limiter:
    # drop metrics if memory usage gets too high
    check_interval: 1s
    limit_percentage: 65
    spike_limit_percentage: 20

Configure the googlemanagedprometheus exporter

By default, using the googlemanagedprometheus exporter on GKE requires no additional configuration. For many use cases you only need to enable it with an empty block in the exporters section:

exporters:
  googlemanagedprometheus:

However, the exporter does provide some optional configuration settings. The following sections describe the other configuration settings.

Setting project_id

To associate your time series with a Google Cloud project, the prometheus_target monitored resource must have project_id set.

When running OpenTelemetry on Google Cloud, the Managed Service for Prometheus exporter defaults to setting this value based on the Application Default Credentials it finds. If no credentials are available, or you want to override the default project, you have two options:

  • Set project in the exporter config
  • Add a gcp.project.id resource attribute to your metrics.

We strongly recommend using the default (unset) value for project_id rather than explicitly setting it, when possible.

Set project in the exporter config

The following config excerpt sends metrics to Managed Service for Prometheus in the Google Cloud project MY_PROJECT:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

exporters:
  googlemanagedprometheus:
    project: MY_PROJECT

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection]
      exporters: [googlemanagedprometheus]

The only change from previous examples is the new line project: MY_PROJECT. This setting is useful if you know that every metric coming through this Collector should be sent to MY_PROJECT.

Set gcp.project.id resource attribute

You can set project association on a per-metric basis by adding a gcp.project.id resource attribute to your metrics. Set the value of the attribute to the name of the project the metric should be associated with.

For example, if your metric already has a label project, this label can be moved to a resource attribute and renamed to gcp.project.id by using processors in the Collector config, as shown in the following example:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

  groupbyattrs:
    keys:
    - project

  resource:
    attributes:
    - key: "gcp.project.id"
      from_attribute: "project"
      action: upsert

exporters:
  googlemanagedprometheus:

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection, groupbyattrs, resource]
      exporters: [googlemanagedprometheus]

Setting client options

The googlemanagedprometheus exporter uses gRPC clients for Managed Service for Prometheus. Therefore, optional settings are available for configuring the gRPC client:

  • compression: Enables gzip compression for gRPC requests, which is useful for minimizing data transfer fees when sending data from other clouds to Managed Service for Prometheus (valid values: gzip).
  • user_agent: Overrides the user-agent string sent on requests to Cloud Monitoring; only applies to metrics. Defaults to the build and version number of your OpenTelemetry Collector, for example, opentelemetry-collector-contrib 0.106.0.
  • endpoint: Sets the endpoint to which metric data is going to be sent.
  • use_insecure: If true, uses gRPC as the communication transport. Has an effect only when the endpoint value is not "".
  • grpc_pool_size: Sets the size of the connection pool in the gRPC client.
  • prefix: Configures the prefix of metrics sent to Managed Service for Prometheus. Defaults to prometheus.googleapis.com. Don't change this prefix; doing so causes metrics to not be queryable with PromQL in the Cloud Monitoring UI.

In most cases, you don't need to change these values from their defaults. However, you can change them to accommodate special circumstances.

All of these settings are set under a metric block in the googlemanagedprometheus exporter section, as shown in the following example:

receivers:
  prometheus:
    config:
    ...

processors:
  resourcedetection:
    detectors: [gcp]
    timeout: 10s

exporters:
  googlemanagedprometheus:
    metric:
      compression: gzip
      user_agent: opentelemetry-collector-contrib 0.106.0
      endpoint: ""
      use_insecure: false
      grpc_pool_size: 1
      prefix: prometheus.googleapis.com

service:
  pipelines:
    metrics:
      receivers: [prometheus]
      processors: [resourcedetection]
      exporters: [googlemanagedprometheus]

What's next