CXR Foundation serving API
Stay organized with collections Save and categorize content based on your preferences.

This document describes the Application Programming Interface (API) for CXR Foundation when deployed as an HTTPS service endpoint, referred to as the service in this document.

Overview

The serving source code for CXR Foundation can be built and hosted on any API management system, but it's specially designed to take advantage of Vertex AI prediction endpoints. Therefore, it conforms to Vertex AI's required API signature and implements a predict method.

The service is designed to support micro batching, not to be mistaken with batch jobs. For every chest x-ray image, text prompt, or pair of image and text prompt in the request, if the processing is successful, the service returns a dictionary of embedding vectors in the corresponding order. Refer to the sections on API request, response, and micro batching for details.

You can provide chest x-ray images to the service either directly within the request (inlined) or by providing a reference to their location. Inlining the images in the request is not recommended for large-scale productions; read more. When using data storage links the service expects corresponding OAuth 2.0 bearer tokens to retrieve the data on your behalf. For detailed information on constructing API requests and the different ways to provide image data, refer to the API request section.

When given DICOM images from a DICOM store, the service expects the underlying DICOM storage system to conform to HAI-DEF DICOM store requirements. Furthermore, the service expects the chest x-ray images to meet more detailed requirements.

To invoke the service, consult the request section, compose a valid request JSON and send a POST request to your endpoint. If you haven't already deployed CXR Foundation as an endpoint, the easiest way is through Model Garden. The following script shows a sample cURL command which you can use to invoke the service. Set LOCATION, PROJECT_ID and ENDPOINT_ID to target your endpoint:

LOCATION="your endpoint location"
PROJECT_ID="your project ID"
ENDPOINT_ID="your endpoint ID"
REQUEST_JSON="path/to/your/request.json"

curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict" \
-d "@${REQUEST_JSON}"

Request

An API request can include multiple instances, each conforming to this schema. Note that this schema is based on Vertex AI PredictSchemata standard and is a partial OpenAPI specification. The complete JSON request has the following structure:

{
  "instances": [
    {...},
    {...}
  ]
}

The service offers flexible input options for analyzing chest x-ray images and related text prompts. You can process images alone, text prompts alone, or combine images and text prompts. While the text prompt is always inlined in the request, you have two ways to provide the images to the service:

Directly within the HTTPS request: You can include image data as base64-encoded bytes using the input_bytes JSON field; read more about inlined images.
Indirectly via storage links: You can provide links to the images stored in GCS using the gcs_uri JSON field, or you can use dicomweb_uri to point to DICOM images stored in DICOM storage; read more about DICOM requirements.

To illustrate these methods, the following example JSON request shows input_bytes, gcs_uri, and dicomweb_uri all in one request. In a real-world scenario, you'll typically only use one of these options for all images within a single request:

{
  "instances": [
    {
      "input_bytes": "your base 64 encoded image bytes",
    },
    {
      "gcs_uri": "gs://your-bucket/path/to/image.png",
      "bearer_token": "your-bearer-token",
    },
    {
      "dicomweb_uri": "https://dicomweb-store.uri/studies/1.2.3.4.5.6.7.8.9/series/1.2.3.4.5.6.7.8.10/instances/1.2.3.4.5.6.7.8.11",
      "bearer_token": "your-bearer-token",
    }
  ]
}

The following JSON example demonstrates how to encode images, text prompts, or combinations of both in a single request to the service. Keep in mind that this is a simplified illustration; actual requests in production environments may have more structured and consistent formats.

{
  "instances": [
    {
      "prompt_query": "airspace opacity"
    },
    {
      "gcs_uri": "gs://your-bucket/path/to/image.png",
      "bearer_token": "your-bearer-token",
    },
    {
      "dicomweb_uri": "https://dicomweb-store.uri/studies/1.2.3.4.5.6.7.8.9/series/1.2.3.4.5.6.7.8.10/instances/1.2.3.4.5.6.7.8.11",
      "bearer_token": "your-bearer-token",
      "prompt_query": "airspace opacity"
    }
  ]
}

Inlined images

You can inline the images in the API request as a base64-encoded string in the input_bytes JSON field. However, keep in mind most API management systems enforce a limit on the maximum size of the request payloads. When CXR Foundation is hosted as a Vertex AI Prediction endpoint, Vertex AI quotas apply.

To optimize the request size, you should compress the images using common image compression codecs. If you require lossless compression, use PNG encoding. If lossy compression is acceptable, use JPEG encoding.

Here is a code snippet for converting compressed JPEG image files from local file system into a base64-encoded string:

import base64

def encode_file_bytes(file_path: str) -> str:
  """Reads a file and returns its contents as a base64-encoded string."""

  with open(file_path, 'rb') as imbytes:
    return base64.b64encode(imbytes.read())

Another code snippet for converting uncompressed image bytes into a lossless PNG format and then converting it into a base64-encoded string:

import base64
import io
import numpy as np
import PIL.Image

def convert_uncompressed_image_bytes_to_base64(image: np.ndarray) -> str:
  """Converts an uncompressed image array to a base64-encoded PNG string."""

  with io.BytesIO() as compressed_img_bytes:
    with PIL.Image.fromarray(image) as pil_image:
      pil_image.save(compressed_img_bytes, 'png')
    return base64.b64encode(compressed_img_bytes.getvalue())

Response

An API response can include multiple predictions that correspond to the order of the instances in the request. Each prediction conforms to this schema. Note that this schema is based on Vertex AI PredictSchemata standard and is a partial OpenAPI specification. The complete JSON request has the following structure:

{
  "predictions": [
    {...},
    {...}
  ],
  "deployedModelId": "model-id",
  "model": "model",
  "modelVersionId": "version-id",
  "modelDisplayName": "model-display-name",
  "metadata": {...}
}

Each request instance can independently succeed or fail. When succeeded, the corresponding prediction JSON includes the embeddings dictionary and when failed an error field. Here is an example of a response to a request with two instances where the first one has succeeded and the second one failed:

{
  "predictions": [
    {
      "contrastive_img_emb": [[0.1, 0.2], [0.3, 0.4]],
      "general_img_emb": [[0.1, 0.2], [0.3, 0.4]],
      "contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4]
    },
    {
      "error": {
        "description": "Some actionable text."
      }
    }
  ],
  "deployedModelId": "model-id",
  "model": "model",
  "modelVersionId": "version-id",
  "modelDisplayName": "model-display-name",
  "metadata": {...}
}

The structure of a successful prediction response depends on the content of the corresponding request instance. The following table summarizes this relationship. Note that the table is only showing use of input_bytes. You can provide images through gcs_uri or dicomweb_uri as well:

Request instance Response prediction

Request instance	Response prediction
`{ "input_bytes": "your base 64 encoded image bytes", "prompt_query": "airspace opacity" }`	`{ "contrastive_img_emb": [[0.1, 0.2], [0.3, 0.4]], "general_img_emb": [[0.1, 0.2], [0.3, 0.4]], "contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4] }`
`{ "input_bytes": "your base 64 encoded image bytes", }`	`{ "general_img_emb": [[0.1, 0.2], [0.3, 0.4]], }`
`{ "prompt_query": "airspace opacity" }`	`{ "contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4] }`

          
{
  "input_bytes": "your base 64 encoded image bytes",
  "prompt_query": "airspace opacity"
}

          
{
  "contrastive_img_emb": [[0.1, 0.2], [0.3, 0.4]],
  "general_img_emb": [[0.1, 0.2], [0.3, 0.4]],
  "contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4]
}

          
{
  "input_bytes": "your base 64 encoded image bytes",
}

          
{
  "general_img_emb": [[0.1, 0.2], [0.3, 0.4]],
}

          
{
  "prompt_query": "airspace opacity"
}

          
{
  "contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4]
}

Micro batching

The API request supports micro batching. You can request embeddings for multiple image, text prompts, or both using different instances within the same JSON request:

{
  "instances": [
    {...},
    {...}
  ]
}

Keep in mind that the total number of embeddings that you can request in one API call will be capped by the service to a fixed limit. A link to the service configuration is coming soon.

DICOM requirements

To ensure compatibility with the service, chest x-ray images in DICOM format must either be from a DICOM storage system that's compatible with HAI-DEF DICOM storage requirements or from DICOM binary files with .dcm file extensions in a GCS bucket and include the following required tags:

Tag	Name	Note
(0002,0010)	`TransferSyntraxUID`
(0008,0008)	`ImageType`
(0008,0016)	`SOPClassUID`
(0008,0018)	`SOPInstanceUID`
(0020,000E)	`SeriesInstanceUID`
(0020,000D)	`StudyInstanceUID`
(0028,0002)	`SamplesPerPixel`	Must be set to `1` for monochrome or grayscale imaging. Refer to Image Pixel Modul Attributes for details.
(0028,0010)	`Rows`
(0028,0011)	`Columns`
(0028,0100)	`BitsAllocated`
(0028,0102)	`HighBit`
(0028,0103)	`PixelRepresentation`
(0028,0004)	`PhotometricInterpretation`
(0028,3000)	`ModalityLUTSequence`	If this tag is not present, the service falls back on `WindowCenter` and `WindowWidth`.
(0028,1050)	`WindowCenter`	If `ModalityLUTSequence` is not present, this tag and `WindowWidth` are required.
(0028,1051)	`WindowWidth`	If `ModalityLUTSequence` is not present, this tag and `WindowCenter` are required.
(7FE0,0010)	`PixelData`

For DICOM images from a DICOM store, the service relies on the Transfer Syntax UID to transcode the image pixels. The following table lists the supported transcodes. If the service can't transcode the images itself, it falls back on the capabilities of the underlying DICOM storage system. If the DICOMs are from a Google Cloud DICOM store, this document on supported transfer syntaxes for transcoding applies.

Supported UID	Name
1.2.840.10008.1.2.4.50 (Recommended)	JPEG Baseline (Process 1): Default Transfer Syntax for Lossy JPEG 8-bit Image Compression
1.2.840.10008.1.2.4.90	JPEG 2000 Image Compression (Lossless Only)
1.2.840.10008.1.2.4.91	JPEG 2000 Image Compression
1.2.840.10008.1.2.1	Uncompressed