This document describes the Application Programming Interface (API) for CXR Foundation when deployed as an HTTPS service endpoint, referred to as the service in this document.
Overview
The
serving source code for CXR Foundation
can be built and hosted on any API management system, but it's specially
designed to take advantage of
Vertex AI prediction endpoints.
Therefore, it conforms to Vertex AI's required API signature and implements a
predict
method.
The service is designed to support micro batching, not to be mistaken with batch jobs. For every chest x-ray image, text prompt, or pair of image and text prompt in the request, if the processing is successful, the service returns a dictionary of embedding vectors in the corresponding order. Refer to the sections on API request, response, and micro batching for details.
You can provide chest x-ray images to the service either directly within the request (inlined) or by providing a reference to their location. Inlining the images in the request is not recommended for large-scale productions; read more. When using data storage links the service expects corresponding OAuth 2.0 bearer tokens to retrieve the data on your behalf. For detailed information on constructing API requests and the different ways to provide image data, refer to the API request section.
When given DICOM images from a DICOM store, the service expects the underlying DICOM storage system to conform to HAI-DEF DICOM store requirements. Furthermore, the service expects the chest x-ray images to meet more detailed requirements.
To invoke the service, consult the request section, compose a
valid request JSON and send a POST
request to your endpoint. If you haven't already deployed CXR Foundation as an
endpoint, the easiest way is through
Model Garden.
The following script shows a sample cURL
command which you can use to invoke the service. Set LOCATION
, PROJECT_ID
and ENDPOINT_ID
to target your endpoint:
LOCATION="your endpoint location"
PROJECT_ID="your project ID"
ENDPOINT_ID="your endpoint ID"
REQUEST_JSON="path/to/your/request.json"
curl \
-X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
"https://${LOCATION}-aiplatform.googleapis.com/v1/projects/${PROJECT_ID}/locations/${LOCATION}/endpoints/${ENDPOINT_ID}:predict" \
-d "@${REQUEST_JSON}"
Request
An API request can include multiple instances, each conforming to this schema. Note that this schema is based on Vertex AI PredictSchemata standard and is a partial OpenAPI specification. The complete JSON request has the following structure:
{
"instances": [
{...},
{...}
]
}
The service offers flexible input options for analyzing chest x-ray images and related text prompts. You can process images alone, text prompts alone, or combine images and text prompts. While the text prompt is always inlined in the request, you have two ways to provide the images to the service:
Directly within the HTTPS request: You can include image data as base64-encoded bytes using the
input_bytes
JSON field; read more about inlined images.Indirectly via storage links: You can provide links to the images stored in GCS using the
gcs_uri
JSON field, or you can usedicomweb_uri
to point to DICOM images stored in DICOM storage; read more about DICOM requirements.
To illustrate these methods, the following example JSON request shows
input_bytes
, gcs_uri
, and dicomweb_uri
all in one request. In a real-world
scenario, you'll typically only use one of these options for all images within a
single request:
{
"instances": [
{
"input_bytes": "your base 64 encoded image bytes",
},
{
"gcs_uri": "gs://your-bucket/path/to/image.png",
"bearer_token": "your-bearer-token",
},
{
"dicomweb_uri": "https://dicomweb-store.uri/studies/1.2.3.4.5.6.7.8.9/series/1.2.3.4.5.6.7.8.10/instances/1.2.3.4.5.6.7.8.11",
"bearer_token": "your-bearer-token",
}
]
}
The following JSON example demonstrates how to encode images, text prompts, or combinations of both in a single request to the service. Keep in mind that this is a simplified illustration; actual requests in production environments may have more structured and consistent formats.
{
"instances": [
{
"prompt_query": "airspace opacity"
},
{
"gcs_uri": "gs://your-bucket/path/to/image.png",
"bearer_token": "your-bearer-token",
},
{
"dicomweb_uri": "https://dicomweb-store.uri/studies/1.2.3.4.5.6.7.8.9/series/1.2.3.4.5.6.7.8.10/instances/1.2.3.4.5.6.7.8.11",
"bearer_token": "your-bearer-token",
"prompt_query": "airspace opacity"
}
]
}
Inlined images
You can inline the images in the API request as a base64-encoded
string in the input_bytes
JSON field. However, keep in mind most API
management systems enforce a limit on the maximum size of the request payloads.
When CXR Foundation is hosted as a Vertex AI Prediction endpoint,
Vertex AI quotas apply.
To optimize the request size, you should compress the images using common image compression codecs. If you require lossless compression, use PNG encoding. If lossy compression is acceptable, use JPEG encoding.
Here is a code snippet for converting compressed JPEG image files from local
file system into a base64-encoded
string:
import base64
def encode_file_bytes(file_path: str) -> str:
"""Reads a file and returns its contents as a base64-encoded string."""
with open(file_path, 'rb') as imbytes:
return base64.b64encode(imbytes.read())
Another code snippet for converting uncompressed image bytes into a lossless PNG
format and then converting it into a base64-encoded
string:
import base64
import io
import numpy as np
import PIL.Image
def convert_uncompressed_image_bytes_to_base64(image: np.ndarray) -> str:
"""Converts an uncompressed image array to a base64-encoded PNG string."""
with io.BytesIO() as compressed_img_bytes:
with PIL.Image.fromarray(image) as pil_image:
pil_image.save(compressed_img_bytes, 'png')
return base64.b64encode(compressed_img_bytes.getvalue())
Response
An API response can include multiple predictions that correspond to the order of the instances in the request. Each prediction conforms to this schema. Note that this schema is based on Vertex AI PredictSchemata standard and is a partial OpenAPI specification. The complete JSON request has the following structure:
{
"predictions": [
{...},
{...}
],
"deployedModelId": "model-id",
"model": "model",
"modelVersionId": "version-id",
"modelDisplayName": "model-display-name",
"metadata": {...}
}
Each request instance can independently succeed or fail. When succeeded, the
corresponding prediction JSON includes the embeddings dictionary and when failed
an error
field. Here is an example of a response to a request with two
instances where the first one has succeeded and the second one failed:
{
"predictions": [
{
"contrastive_img_emb": [[0.1, 0.2], [0.3, 0.4]],
"general_img_emb": [[0.1, 0.2], [0.3, 0.4]],
"contrastive_txt_emb": [0.1, 0.2, 0.3, 0.4]
},
{
"error": {
"description": "Some actionable text."
}
}
],
"deployedModelId": "model-id",
"model": "model",
"modelVersionId": "version-id",
"modelDisplayName": "model-display-name",
"metadata": {...}
}
The structure of a successful prediction response depends on the content of the
corresponding request instance. The following table summarizes this
relationship. Note that the table is only showing use of input_bytes
. You can
provide images through gcs_uri
or dicomweb_uri
as well:
Request instance | Response prediction |
---|---|
|
|
|
|
|
|
Micro batching
The API request supports micro batching. You can request embeddings for multiple image, text prompts, or both using different instances within the same JSON request:
{
"instances": [
{...},
{...}
]
}
Keep in mind that the total number of embeddings that you can request in one API call will be capped by the service to a fixed limit. A link to the service configuration is coming soon.
DICOM requirements
To ensure compatibility with the service, chest x-ray images in DICOM format
must either be from a DICOM storage system that's compatible with
HAI-DEF DICOM storage requirements
or from DICOM binary files with .dcm
file extensions in a GCS bucket and
include the following required tags:
Tag | Name | Note |
---|---|---|
(0002,0010) | TransferSyntraxUID |
|
(0008,0008) | ImageType |
|
(0008,0016) | SOPClassUID |
|
(0008,0018) | SOPInstanceUID |
|
(0020,000E) | SeriesInstanceUID |
|
(0020,000D) | StudyInstanceUID |
|
(0028,0002) | SamplesPerPixel |
Must be set to 1 for monochrome or grayscale imaging. Refer to Image Pixel Modul Attributes for details. |
(0028,0010) | Rows |
|
(0028,0011) | Columns |
|
(0028,0100) | BitsAllocated |
|
(0028,0102) | HighBit |
|
(0028,0103) | PixelRepresentation |
|
(0028,0004) | PhotometricInterpretation |
|
(0028,3000) | ModalityLUTSequence |
If this tag is not present, the service falls back on WindowCenter and WindowWidth . |
(0028,1050) | WindowCenter |
If ModalityLUTSequence is not present, this tag and WindowWidth are required. |
(0028,1051) | WindowWidth |
If ModalityLUTSequence is not present, this tag and WindowCenter are required. |
(7FE0,0010) | PixelData |
For DICOM images from a DICOM store, the service relies on the Transfer Syntax UID to transcode the image pixels. The following table lists the supported transcodes. If the service can't transcode the images itself, it falls back on the capabilities of the underlying DICOM storage system. If the DICOMs are from a Google Cloud DICOM store, this document on supported transfer syntaxes for transcoding applies.
Supported UID | Name |
---|---|
1.2.840.10008.1.2.4.50 (Recommended) | JPEG Baseline (Process 1): Default Transfer Syntax for Lossy JPEG 8-bit Image Compression |
1.2.840.10008.1.2.4.90 | JPEG 2000 Image Compression (Lossless Only) |
1.2.840.10008.1.2.4.91 | JPEG 2000 Image Compression |
1.2.840.10008.1.2.1 | Uncompressed |