sparkdq.exceptions#

exception CheckConfigurationError[source]

Bases: Exception

Base class for all configuration-related errors in data quality checks.

Raised when a check’s definition or setup is invalid and cannot be used to construct or apply the check logic, regardless of the actual dataset.

exception InvalidCheckConfigurationError[source]

Bases: CheckConfigurationError

Raised when a check’s configuration is logically invalid.

Examples include setting a minimum value greater than the maximum, or supplying conflicting or incomplete configuration parameters.

In most cases, this error is raised during static validation of a config object.

exception InvalidSQLExpressionError(expression: str, error_message: str)[source]

Bases: RuntimeCheckConfigurationError

Raised when a SQL expression is syntactically or semantically invalid.

This error indicates that the provided SQL expression cannot be parsed or executed by PySpark, typically due to syntax errors, invalid function calls, or other structural issues that prevent the expression from being evaluated.

Examples include:

Malformed syntax: “age + “
Unbalanced parentheses: “upper(name”
Invalid function calls: “invalid_function(column)”
Dangerous operations: “DROP TABLE users”

This exception is used to ensure consistent error handling for SQL expression validation in data quality checks.

exception InvalidSeverityLevelError(value: str)[source]

Bases: CheckConfigurationError

Raised when a provided severity level is not recognized by the framework.

This error typically occurs when parsing string-based severity inputs (e.g. from JSON or YAML configuration files) that do not match the allowed levels defined in the Severity enum.

Examples include:

normalize_severity(“fatal”)
loading a config with severity=”urgent”

This exception is used to ensure consistent error handling and reporting for configuration-related issues.

exception MissingCheckSetError[source]

Bases: RuntimeCheckConfigurationError

Raised when a data quality engine is executed without an assigned CheckSet.

This error indicates that the engine was not properly configured before use. Users must assign a CheckSet instance via engine.set_check_set(…) prior to calling any validation methods like run_batch.

This is typically a programming error and should be caught early in testing.

exception MissingCheckTypeError[source]

Bases: CheckConfigurationError

Raised when a configuration dictionary is missing the required ‘check’ field.

This field is mandatory for the framework to identify which check type should be instantiated via the CheckFactory.

This exception is typically raised during early parsing or validation of configuration sources (e.g. JSON, YAML).

exception MissingColumnError(column: str, available: list[str])[source]

Bases: RuntimeCheckConfigurationError

Raised when a required column is not present in the DataFrame at runtime.

This typically indicates a misconfiguration in the check setup, where a column was referenced that does not exist in the current dataset.

exception MissingReferenceDatasetError(name: str)[source]

Bases: RuntimeCheckConfigurationError

Raised when a requested reference dataset is not available in the current validation context.

exception RuntimeCheckConfigurationError[source]

Bases: Exception

Base class for runtime configuration errors in data quality checks.

Raised when a check fails due to a configuration issue that can only be detected when the check is executed against actual data, such as referencing a non-existent column.