Troubleshoot Cluster Linking on Confluent Platform¶
This page describes common errors you may encounter when creating cluster links, and how to address them.
Task error codes for cluster link and mirror topic commands are also listed.
The command to create a cluster link can fail with one of the following error codes.
A customized error message will also be returned, to provide specific context about
the clusters being used.
Meaning: The destination cluster (or, for a source-initiated link, the source cluster)
could not resolve the provided bootstrap server to an IP address. In other words, there was
no DNS entry for this bootstrap server in the cluster’s DNS.
Solutions:
If the provided source cluster bootstrap server is a URL with a FQDN that is not Confluent Cloud
(for example, kafka.example.com:9092), then the DNS is not properly configured for the
destination cluster to resolve that FQDN to an IP address.
If the destination cluster is a Confluent Cloud cluster, this error is likely because your FQDN relies
on your private DNS; there is no way to provide Confluent Cloud with access to your private DNS. In this
case, you have two options:
The best way to resolve this error is to use a FQDN that is registered in a public DNS.
Alternatively, use an IP address (such as, 10.10.4.24:9092) for the
bootstrap server (instead of a host name). Your source cluster must have an IP address
advertised listener on every broker before you can use Cluster Linking with an IP address.
Alternatively, you can register this FQDN with a public DNS, pointing to the appropriate IP address of your
broker, proxy, or load balancer (internal IPs that can be registered in a public DNS), so that Confluent Cloud
can resolve this FQDN to an IP that it can reach.
If the destination cluster is a Confluent Platform cluster, then you can attempt to fix the destination cluster’s DNS.
Ensure the route to your source cluster allows traffic over port 9092.
If the provided source cluster bootstrap server is for Confluent Cloud (for example, pkc-lmn0p.us-west-2.aws.confluent.cloud:9092),
then contact Confluent Support.
If the source cluster does not have Internet networking or Transit Gateway networking, make sure that the two Confluent Cloud clusters exist in the same Confluent Cloud network, region,
and cloud provider. Cluster Linking is not supported between private Confluent Cloud clusters in different Confluent Cloud networks, other than with Transit Gateway.
If the source cluster is a Confluent Platform or Kafka cluster, and the destination cluster is a Confluent Cloud cluster:
Your Confluent Cloud cluster cannot reach your source cluster and you must resolve this.
Providing network access for Confluent Cloud to reach your source cluster is your responsibility.
If you are intending to access a source cluster over the internet, validate that all of your source cluster’s
brokers have public internet IP addresses. You can verify this by consuming from your cluster (for example, with
kafka-console-consumer) on a machine that does not have VPN or other private access to your cluster.
For the consumer configuration, enter the bootstrap servers and security configuration that you provided to the cluster link.
If you are intending to access a source cluster over private internet–that is, using private IP addresses–this error indicates
that you have not properly set up the networking between Confluent Cloud and the source cluster. Double check that your networking meets these requirements:
The Confluent Cloud cluster must have a networking type that is either VPC Peered, VNet Peered, or Transit Gateway.
You should test that the machines that host your source cluster brokers have connectivity to the Confluent Cloud cluster,
as described in Test connectivity to Confluent Cloud. Under some circumstances, you can also test connectivity from Confluent Cloud VPC or VNet
to your source cluster using a tool like the AWS VPC Reachability Analyzer.
If the source cluster is in a cloud VPC or VNET and the Confluent Cloud cluster’s networking type is VPC Peered or VNET Peered,
the Confluent Cloud VPC must be peered to the VPC that hosts the source cluster.
If the source cluster is in a cloud VPC or VNET and the Confluent Cloud cluster’s networking type is VPC Peered or VNET Peered,
the Transit Gateway routing must be configured to allow the Confluent Cloud VPC to reach the VPC that hosts the source cluster,
and vice versa.
If the source cluster is not hosted in a public cloud—for instance, if it is in an on-premises datacenter––make sure that you
are using AWS Transit Gateway or GCP Route Import to provide connectivity between your cluster’s host machines and Confluent Cloud.
Meaning: The source cluster security credential provided to the cluster link was not able to authenticate with the source cluster.
Solutions:
Confirm the security configuration that you assigned your cluster link.
For a Confluent Cloud source cluster, confirm that your link configuration has these properties:
If using API keys:
- security.protocol=SASL_SSL
- sasl.mechanism=PLAIN
- sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModulerequiredusername='clusterAPIkey>'password='clusterAPIsecret>';
If using OAuth:
- same parameters as your consumers use to authenticate with OAuth on the source cluster
For a Confluent Platform or Kafka source cluster, verify that the cluster link principal used in your link
configuration is using an authentication mechanism that is enabled on the source cluster.
Meaning: Your source cluster bootstrap server was to a private internal endpoint or port that Cluster Linking is not allowed to access in Confluent Cloud.
Solution: Verify that your bootstrap server uses port 9092.
After a cluster link has been created, it can go into a state labeled UNAVAILABLE,
which can be seen when listing the cluster link in the CLI or REST API, or through the
link-count metric in Metrics API in Confluent Cloud. This state means the destination cluster does not have
connectivity to the source cluster and cannot copy data from the source cluster.
To resolve this error, first, get more details about the error by describing the cluster link
in the CLI or REST API. That response returns an error code and a dynamic, customized error message.
Then, troubleshoot the error using the other methods described on this page.
The following error codes are shared by cluster link and mirror topic commands to indicate problems with the link tasks or the mirror transition tasks.
With the exception of INTERNAL_ERROR, these are all user-created errors.
In other words, if the error code is INTERNAL_ERROR, then Confluent will have been alerted
and will be working on fixing the issue. Otherwise, the error is customer created and
the customer can and should alert on, as the error requires customer action to resolve.
Task Status
Description
UNKNOWN
Error cause cannot be determined.
INTERNAL_ERROR
System error caused by Confluent software. This type of error automatically alerts Confluent, and resolution is in work.
AUTHENTICATION_ERROR
Authentication credentials are not properly configured.
AUTHORIZATION_ERROR
Authorization credentials are not properly configured.
BROKER_AUTHENTICATION_ERROR
Authentication credentials on the broker are not properly configured Confluent Platform.
BROKER_AUTHORIZATION_ERROR
Authorization credentials on the broker are not properly configured Confluent Platform.
MISCONFIGURATION_ERROR
A misconfiguration is causing errors.
REMOTE_LINK_NOT_FOUND_ERROR
The remote link was unexpectedly not found.
LINK_NOT_FOUND_ERROR
The cluster link cannot be found.
CONSUMER_GROUP_IN_USE_ERROR
The consumer group is active on the destination, causing offsets to not be synced.
SECURITY_DISABLED_ERROR
No authorizer is configured on the source cluster.
TOPIC_EXISTS_ERROR
A topic exists on the destination unexpectedly.
POLICY_VIOLATION_ERROR
The topic transition violates a policy.
LINK_COORDINATOR_NOT_ENABLED_ERROR
Cluster link is not enabled.
ACL_LIMIT_EXCEEDED
ACLs limit on the link has been exceeded.
REMOTE_MIRROR_NOT_FOUND_ERROR
Remote mirror topic is not available.
UNKNOWN_TOPIC_OR_PARTITION_ERROR
Either a topic or partition was unexpectedly not found.
INVALID_TOPIC
An InvalidTopicException was encountered from the destination cluster. This error would occur, for example, if the auto-create mirror task tries to create a topic on the destination cluster and the topic name is invalid. See the error message for more details.
SUPPRESSED_ERRORS
This means some errors were suppressed because too many were encountered.
INVAILD_REQUEST_ERROR
An InvalidRequestException was encountered. See the error message for more details.
On Confluent Cloud and Confluent Platform, starting with Confluent Platform 7.5.0, mirrorcommanddescribe on a failed mirror topic returns
the cause of the failure. In the case of a failed mirror topic, you have the following choices to remediate
(with the repair option on Confluent Platform available in versions 7.6.0 and later):
Failover or delete the mirror topic.
Contact Confluent Support to repair the failed mirror topic for a subset of the failures.
Failure causes that can be repaired include the following:
UNSUPPORTED_MESSAGE_FORMAT: Source leader epoch went backwards, source topic may have been recreated.
RECORD_TOO_LARGE: Truncation below high watermark. This can be caused by unclean source leader election or other errors such as inability to detect source topic recreation.
TRUNCATION_BELOW_HIGH_WATERMARK
Mirror topics can be transitioned in various ways (promote, failover, reverse, and so on). Both Confluent Cloud and Confluent Platform provide metrics and APIs that can help you find solutions when things go wrong.
To learn more, see the following topics: