Error classes in PySpark#
This is a list of common, named error classes returned by PySpark which are defined at error-conditions.json.
When writing PySpark errors, developers must use an error class from the list. If an appropriate error class is not available, add a new one into the list. For more information, please refer to Contributing Error and Exception.
APPLICATION_NAME_NOT_SET#
An application name must be set in your configuration.
ARGUMENT_REQUIRED#
Argument
ARROW_LEGACY_IPC_FORMAT#
Arrow legacy IPC format is not supported in PySpark, please unset ARROW_PRE_0_15_IPC_FORMAT.
ATTRIBUTE_NOT_CALLABLE#
Attribute
ATTRIBUTE_NOT_SUPPORTED#
Attribute
AXIS_LENGTH_MISMATCH#
Length mismatch: Expected axis has
BROADCAST_VARIABLE_NOT_LOADED#
Broadcast variable
CALL_BEFORE_INITIALIZE#
Not supported to call
CANNOT_ACCEPT_OBJECT_IN_TYPE#
CANNOT_ACCESS_TO_DUNDER#
Dunder(double underscore) attribute is for internal use only.
CANNOT_APPLY_IN_FOR_COLUMN#
Cannot apply ‘in’ operator against a column: please use ‘contains’ in a string column or ‘array_contains’ function for an array column.
CANNOT_BE_EMPTY#
At least one
CANNOT_BE_NONE#
Argument
CANNOT_CONFIGURE_SPARK_CONNECT#
Spark Connect server cannot be configured: Existing [
CANNOT_CONFIGURE_SPARK_CONNECT_MASTER#
Spark Connect server and Spark master cannot be configured together: Spark master [
CANNOT_CONVERT_COLUMN_INTO_BOOL#
Cannot convert column into bool: please use ‘&’ for ‘and’, ‘|’ for ‘or’, ‘~’ for ‘not’ when building DataFrame boolean expressions.
CANNOT_CONVERT_TYPE#
Cannot convert
CANNOT_DETERMINE_TYPE#
Some of types cannot be determined after inferring.
CANNOT_GET_BATCH_ID#
Could not get batch id from
CANNOT_INFER_ARRAY_ELEMENT_TYPE#
Can not infer the element data type, an non-empty list starting with an non-None value is required.
CANNOT_INFER_EMPTY_SCHEMA#
Can not infer schema from an empty dataset.
CANNOT_INFER_SCHEMA_FOR_TYPE#
Can not infer schema for type:
CANNOT_INFER_TYPE_FOR_FIELD#
Unable to infer the type of the field
CANNOT_MERGE_TYPE#
Can not merge type
CANNOT_OPEN_SOCKET#
Can not open socket:
CANNOT_PARSE_DATATYPE#
Unable to parse datatype.
CANNOT_PROVIDE_METADATA#
Metadata can only be provided for a single column.
CANNOT_REGISTER_UDTF#
Cannot register the UDTF ‘
CANNOT_SET_TOGETHER#
CANNOT_SPECIFY_RETURN_TYPE_FOR_UDF#
returnType can not be specified when
CANNOT_WITHOUT#
Cannot
CLASSIC_OPERATION_NOT_SUPPORTED_ON_DF#
Calling property or member ‘
COLLATION_INVALID_PROVIDER#
The value
COLUMN_IN_LIST#
CONNECT_URL_ALREADY_DEFINED#
Only one Spark Connect client URL can be set; however, got a different URL [
CONNECT_URL_NOT_SET#
Cannot create a Spark Connect session because the Spark Connect remote URL has not been set. Please define the remote URL by setting either the ‘spark.remote’ option or the ‘SPARK_REMOTE’ environment variable.
CONTEXT_ONLY_VALID_ON_DRIVER#
It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
DATA_SOURCE_INVALID_RETURN_TYPE#
Unsupported return type (‘
DATA_SOURCE_RETURN_SCHEMA_MISMATCH#
Return schema mismatch in the result from ‘read’ method. Expected:
DATA_SOURCE_TYPE_MISMATCH#
Expected
DIFFERENT_PANDAS_DATAFRAME#
DataFrames are not almost equal:
Left:
DIFFERENT_PANDAS_INDEX#
Indices are not almost equal:
Left:
DIFFERENT_PANDAS_MULTIINDEX#
MultiIndices are not almost equal:
Left:
DIFFERENT_PANDAS_SERIES#
Series are not almost equal:
Left:
DIFFERENT_ROWS#
DIFFERENT_SCHEMA#
Schemas do not match.
— actual
+++ expected
DISALLOWED_TYPE_FOR_CONTAINER#
Argument
DUPLICATED_ARTIFACT#
Duplicate Artifact:
DUPLICATED_FIELD_NAME_IN_ARROW_STRUCT#
Duplicated field names in Arrow Struct are not allowed, got
ERROR_OCCURRED_WHILE_CALLING#
An error occurred while calling
FIELD_DATA_TYPE_UNACCEPTABLE#
FIELD_DATA_TYPE_UNACCEPTABLE_WITH_NAME#
FIELD_NOT_NULLABLE#
Field is not nullable, but got None.
FIELD_NOT_NULLABLE_WITH_NAME#
FIELD_STRUCT_LENGTH_MISMATCH#
Length of object (
FIELD_STRUCT_LENGTH_MISMATCH_WITH_NAME#
FIELD_TYPE_MISMATCH#
FIELD_TYPE_MISMATCH_WITH_NAME#
HIGHER_ORDER_FUNCTION_SHOULD_RETURN_COLUMN#
Function
INCORRECT_CONF_FOR_PROFILE#
- spark.python.profile or spark.python.profile.memory configuration
must be set to true to enable Python profile.
INDEX_NOT_POSITIVE#
Index must be positive, got ‘
INDEX_OUT_OF_RANGE#
INVALID_ARROW_UDTF_RETURN_TYPE#
The return type of the arrow-optimized Python UDTF should be of type ‘pandas.DataFrame’, but the ‘
INVALID_BROADCAST_OPERATION#
Broadcast can only be
INVALID_CALL_ON_UNRESOLVED_OBJECT#
Invalid call to
INVALID_CONNECT_URL#
Invalid URL for Spark Connect:
INVALID_INTERVAL_CASTING#
Interval
INVALID_ITEM_FOR_CONTAINER#
All items in
INVALID_JSON_DATA_TYPE_FOR_COLLATIONS#
Collations can only be applied to string types, but the JSON data type is
INVALID_MULTIPLE_ARGUMENT_CONDITIONS#
[{arg_names}] cannot be
INVALID_NDARRAY_DIMENSION#
NumPy array input should be of
INVALID_NUMBER_OF_DATAFRAMES_IN_GROUP#
Invalid number of dataframes in group
INVALID_PANDAS_UDF#
Invalid function:
INVALID_PANDAS_UDF_TYPE#
INVALID_RETURN_TYPE_FOR_ARROW_UDF#
Grouped and Cogrouped map Arrow UDF should return StructType for
INVALID_RETURN_TYPE_FOR_PANDAS_UDF#
Pandas UDF should return StructType for
INVALID_SESSION_UUID_ID#
Parameter value
INVALID_TIMEOUT_TIMESTAMP#
Timeout timestamp (
INVALID_TYPE#
Argument
INVALID_TYPENAME_CALL#
StructField does not have typeName. Use typeName on its type explicitly instead.
INVALID_TYPE_DF_EQUALITY_ARG#
Expected type
INVALID_UDF_EVAL_TYPE#
Eval type for UDF must be
INVALID_UDTF_BOTH_RETURN_TYPE_AND_ANALYZE#
The UDTF ‘
INVALID_UDTF_EVAL_TYPE#
The eval type for the UDTF ‘
INVALID_UDTF_HANDLER_TYPE#
The UDTF is invalid. The function handler must be a class, but got ‘
INVALID_UDTF_NO_EVAL#
The UDTF ‘
INVALID_UDTF_RETURN_TYPE#
The UDTF ‘
INVALID_WHEN_USAGE#
when() can only be applied on a Column previously generated by when() function, and cannot be applied once otherwise() is applied.
INVALID_WINDOW_BOUND_TYPE#
Invalid window bound type:
JAVA_GATEWAY_EXITED#
Java gateway process exited before sending its port number.
JVM_ATTRIBUTE_NOT_SUPPORTED#
Attribute
KEY_NOT_EXISTS#
Key
KEY_VALUE_PAIR_REQUIRED#
Key-value pair or a list of pairs is required.
LENGTH_SHOULD_BE_THE_SAME#
MALFORMED_VARIANT#
Variant binary is malformed. Please check the data source is valid.
MASTER_URL_INVALID#
Master must either be yarn or start with spark, k8s, or local.
MASTER_URL_NOT_SET#
A master URL must be set in your configuration.
MEMORY_PROFILE_INVALID_SOURCE#
Memory profiler can only be used on editors with line numbers.
MISSING_LIBRARY_FOR_PROFILER#
Install the ‘memory_profiler’ library in the cluster to enable memory profiling.
MISSING_VALID_PLAN#
Argument to
MIXED_TYPE_REPLACEMENT#
Mixed type replacements are not supported.
NEGATIVE_VALUE#
Value for
NOT_BOOL#
Argument
NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_LIST_OR_STR_OR_TUPLE#
Argument
NOT_BOOL_OR_DICT_OR_FLOAT_OR_INT_OR_STR#
Argument
NOT_BOOL_OR_FLOAT_OR_INT#
Argument
NOT_BOOL_OR_FLOAT_OR_INT_OR_LIST_OR_NONE_OR_STR_OR_TUPLE#
Argument
NOT_BOOL_OR_FLOAT_OR_INT_OR_STR#
Argument
NOT_BOOL_OR_LIST#
Argument
NOT_BOOL_OR_STR#
Argument
NOT_CALLABLE#
Argument
NOT_COLUMN#
Argument
NOT_COLUMN_OR_DATATYPE_OR_STR#
Argument
NOT_COLUMN_OR_FLOAT_OR_INT_OR_LIST_OR_STR#
Argument
NOT_COLUMN_OR_INT#
Argument
NOT_COLUMN_OR_INT_OR_LIST_OR_STR_OR_TUPLE#
Argument
NOT_COLUMN_OR_INT_OR_STR#
Argument
NOT_COLUMN_OR_LIST_OR_STR#
Argument
NOT_COLUMN_OR_STR#
Argument
NOT_COLUMN_OR_STR_OR_STRUCT#
Argument
NOT_DATAFRAME#
Argument
NOT_DATATYPE_OR_STR#
Argument
NOT_DICT#
Argument
NOT_EXPRESSION#
Argument
NOT_FLOAT_OR_INT#
Argument
NOT_FLOAT_OR_INT_OR_LIST_OR_STR#
Argument
NOT_IMPLEMENTED#
NOT_INT#
Argument
NOT_INT_OR_SLICE_OR_STR#
Argument
NOT_IN_BARRIER_STAGE#
It is not in a barrier stage.
NOT_ITERABLE#
NOT_LIST#
Argument
NOT_LIST_OF_COLUMN#
Argument
NOT_LIST_OF_COLUMN_OR_STR#
Argument
NOT_LIST_OF_FLOAT_OR_INT#
Argument
NOT_LIST_OF_STR#
Argument
NOT_LIST_OR_NONE_OR_STRUCT#
Argument
NOT_LIST_OR_STR_OR_TUPLE#
Argument
NOT_LIST_OR_TUPLE#
Argument
NOT_NUMERIC_COLUMNS#
Numeric aggregation function can only be applied on numeric columns, got
NOT_OBSERVATION_OR_STR#
Argument
NOT_SAME_TYPE#
Argument
NOT_STR#
Argument
NOT_STRUCT#
Argument
NOT_STR_OR_LIST_OF_RDD#
Argument
NOT_STR_OR_STRUCT#
Argument
NOT_WINDOWSPEC#
Argument
NO_ACTIVE_EXCEPTION#
No active exception.
NO_ACTIVE_OR_DEFAULT_SESSION#
No active or default Spark session found. Please create a new Spark session before running the code.
NO_ACTIVE_SESSION#
No active Spark session found. Please create a new Spark session before running the code.
NO_OBSERVE_BEFORE_GET#
Should observe by calling DataFrame.observe before get.
NO_SCHEMA_AND_DRIVER_DEFAULT_SCHEME#
Only allows
ONLY_ALLOWED_FOR_SINGLE_COLUMN#
Argument
ONLY_ALLOW_SINGLE_TRIGGER#
Only a single trigger is allowed.
ONLY_SUPPORTED_WITH_SPARK_CONNECT#
PACKAGE_NOT_INSTALLED#
PANDAS_API_ON_SPARK_FAIL_ON_ANSI_MODE#
Pandas API on Spark does not properly work on ANSI mode. Please set a Spark config ‘spark.sql.ansi.enabled’ to false. Alternatively set a pandas-on-spark option ‘compute.fail_on_ansi_mode’ to False to force it to work, although it can cause unexpected behavior.
PANDAS_UDF_OUTPUT_EXCEEDS_INPUT_ROWS#
The Pandas SCALAR_ITER UDF outputs more rows than input rows.
PIPE_FUNCTION_EXITED#
Pipe function
PLOT_INVALID_TYPE_COLUMN#
Column
PLOT_NOT_NUMERIC_COLUMN_ARGUMENT#
Argument
PYTHON_HASH_SEED_NOT_SET#
Randomness of hash of string should be disabled via PYTHONHASHSEED.
PYTHON_STREAMING_DATA_SOURCE_RUNTIME_ERROR#
Failed when running Python streaming data source:
PYTHON_VERSION_MISMATCH#
Python in worker has different version:
RDD_TRANSFORM_ONLY_VALID_ON_DRIVER#
It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation. RDD transformations and actions can only be invoked by the driver, not inside of other transformations; for example, rdd1.map(lambda x: rdd2.values.count() * x) is invalid because the values transformation and count action cannot be performed inside of the rdd1.map transformation. For more information, see SPARK-5063.
READ_ONLY#
RESPONSE_ALREADY_RECEIVED#
OPERATION_NOT_FOUND on the server but responses were already received from it.
RESULT_COLUMNS_MISMATCH_FOR_ARROW_UDF#
Column names of the returned pyarrow.Table do not match specified schema.
RESULT_COLUMNS_MISMATCH_FOR_PANDAS_UDF#
Column names of the returned pandas.DataFrame do not match specified schema.
RESULT_LENGTH_MISMATCH_FOR_PANDAS_UDF#
Number of columns of the returned pandas.DataFrame doesn’t match specified schema. Expected:
RESULT_LENGTH_MISMATCH_FOR_SCALAR_ITER_PANDAS_UDF#
The length of output in Scalar iterator pandas UDF should be the same with the input’s; however, the length of output was
RESULT_TYPE_MISMATCH_FOR_ARROW_UDF#
Columns do not match in their data type:
RETRIES_EXCEEDED#
The maximum number of retries has been exceeded.
REUSE_OBSERVATION#
An Observation can be used with a DataFrame only once.
SCHEMA_MISMATCH_FOR_PANDAS_UDF#
Result vector from pandas_udf was not the required length: expected
SESSION_ALREADY_EXIST#
Cannot start a remote Spark session because there is a regular Spark session already running.
SESSION_NEED_CONN_STR_OR_BUILDER#
Needs either connection string or channelBuilder (mutually exclusive) to create a new SparkSession.
SESSION_NOT_SAME#
Both Datasets must belong to the same SparkSession.
SESSION_OR_CONTEXT_EXISTS#
There should not be an existing Spark Session or Spark Context.
SESSION_OR_CONTEXT_NOT_EXISTS#
SparkContext or SparkSession should be created first.
SLICE_WITH_STEP#
Slice with step is not supported.
STATE_NOT_EXISTS#
State is either not defined or has already been removed.
STOP_ITERATION_OCCURRED#
Caught StopIteration thrown from user’s code; failing the task:
STOP_ITERATION_OCCURRED_FROM_SCALAR_ITER_PANDAS_UDF#
pandas iterator UDF should exhaust the input iterator.
STREAMING_CONNECT_SERIALIZATION_ERROR#
Cannot serialize the function
TEST_CLASS_NOT_COMPILED#
TOO_MANY_VALUES#
Expected
TYPE_HINT_SHOULD_BE_SPECIFIED#
Type hints for
UDF_RETURN_TYPE#
Return type of the user-defined function should be
UDTF_ARROW_TYPE_CAST_ERROR#
Cannot convert the output value of the column ‘
UDTF_CONSTRUCTOR_INVALID_IMPLEMENTS_ANALYZE_METHOD#
Failed to evaluate the user-defined table function ‘
UDTF_CONSTRUCTOR_INVALID_NO_ANALYZE_METHOD#
Failed to evaluate the user-defined table function ‘
UDTF_EVAL_METHOD_ARGUMENTS_DO_NOT_MATCH_SIGNATURE#
Failed to evaluate the user-defined table function ‘
UDTF_EXEC_ERROR#
User defined table function encountered an error in the ‘
UDTF_INVALID_OUTPUT_ROW_TYPE#
The type of an individual output row in the ‘
UDTF_RETURN_NOT_ITERABLE#
The return value of the ‘
UDTF_RETURN_SCHEMA_MISMATCH#
The number of columns in the result does not match the specified schema. Expected column count:
UDTF_RETURN_TYPE_MISMATCH#
Mismatch in return type for the UDTF ‘
UDTF_SERIALIZATION_ERROR#
Cannot serialize the UDTF ‘
UNEXPECTED_RESPONSE_FROM_SERVER#
Unexpected response from iterator server.
UNEXPECTED_TUPLE_WITH_STRUCT#
Unexpected tuple
UNKNOWN_EXPLAIN_MODE#
Unknown explain mode: ‘
UNKNOWN_INTERRUPT_TYPE#
Unknown interrupt type: ‘
UNKNOWN_RESPONSE#
Unknown response:
UNKNOWN_VALUE_FOR#
Unknown value for .
UNSUPPORTED_DATA_TYPE#
Unsupported DataType
UNSUPPORTED_DATA_TYPE_FOR_ARROW#
Single data type
UNSUPPORTED_DATA_TYPE_FOR_ARROW_CONVERSION#
UNSUPPORTED_DATA_TYPE_FOR_ARROW_VERSION#
UNSUPPORTED_JOIN_TYPE#
Unsupported join type: ‘
UNSUPPORTED_LITERAL#
Unsupported Literal ‘
UNSUPPORTED_LOCAL_CONNECTION_STRING#
Creating new SparkSessions with local connection string is not supported.
UNSUPPORTED_NUMPY_ARRAY_SCALAR#
The type of array scalar ‘
UNSUPPORTED_OPERATION#
UNSUPPORTED_PACKAGE_VERSION#
UNSUPPORTED_PARAM_TYPE_FOR_HIGHER_ORDER_FUNCTION#
Function
UNSUPPORTED_PIE_PLOT_PARAM#
Pie plot requires either a y column or subplots=True.
UNSUPPORTED_PLOT_BACKEND#
UNSUPPORTED_PLOT_BACKEND_PARAM#
UNSUPPORTED_SIGNATURE#
Unsupported signature:
UNSUPPORTED_WITH_ARROW_OPTIMIZATION#
VALUE_ALLOWED#
Value for
VALUE_NOT_ACCESSIBLE#
Value
VALUE_NOT_ALLOWED#
Value for
VALUE_NOT_ANY_OR_ALL#
Value for
VALUE_NOT_BETWEEN#
Value for
VALUE_NOT_NON_EMPTY_STR#
Value for
VALUE_NOT_PEARSON#
Value for
VALUE_NOT_PLAIN_COLUMN_REFERENCE#
Value
VALUE_NOT_POSITIVE#
Value for
VALUE_NOT_TRUE#
Value for
VALUE_OUT_OF_BOUNDS#
Value for
WRONG_NUM_ARGS_FOR_HIGHER_ORDER_FUNCTION#
Function
WRONG_NUM_COLUMNS#
Function
ZERO_INDEX#
Index must be non-zero.