sparkdq.checks

sparkdq.checks#

This module serves as the dynamic entry point for the sparkdq.checks subpackage.

It recursively traverses all submodules within the sparkdq.checks package to identify and register all classes that inherit from either BaseRowCheckConfig or BaseAggregateCheckConfig.

Each valid check configuration class is added to the module’s global namespace and included in __all__, ensuring it becomes part of the public API.

This approach eliminates the need for manual imports and __all__ maintenance, making the system easy to extend as new check classes are added. It also allows developers and users to import any check configuration directly from the top-level sparkdq namespace.

This mechanism incurs a small runtime overhead during the initial import of sparkdq.checks, but improves maintainability and scalability significantly in large modular frameworks.

class ColumnGreaterThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ColumnGreaterThanCheck.

This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly greater than (or greater than or equal to, if inclusive=True) the result of evaluating limit.

Null values in either column or the limit result are treated as invalid and will fail the check.

Example

>>> # Simple column comparison
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="end-after-start",
...     column="end_time",
...     limit="start_time",
...     inclusive=True
... )
>>> # Expression with mathematical operation
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="price-with-margin",
...     column="selling_price",
...     limit="cost_price * 1.2",
...     inclusive=False
... )
>>> # Complex conditional expression
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="score-validation",
...     column="user_score",
...     limit="CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END",
...     inclusive=True
... )
column

Column expected to contain greater (or equal) values.

Type:

str

limit

The column or a Spark SQL expression expected to contain smaller values.

Type:

str

inclusive

If True, validates column >= limit. If False, requires strict inequality (>). Defaults to False.

Type:

bool

check_class

alias of ColumnGreaterThanCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnLessThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ColumnLessThanCheck.

This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly less than (or less than or equal to, if inclusive=True) the result of evaluating limit.

Null values in either column or the limit result are treated as invalid and will fail the check.

Examples

>>> # Simple column comparison
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="start-before-end",
...     column="start_time",
...     limit="end_time",
...     inclusive=True
... )
>>> # Expression with mathematical operation
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="price-with-margin",
...     column="cost_price",
...     limit="selling_price * 0.8",
...     inclusive=True
... )
>>> # Complex conditional expression
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="score-validation",
...     column="user_score",
...     limit="CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END",
...     inclusive=False
... )
column

Column expected to contain smaller (or equal) values.

Type:

str

limit

The column or a Spark SQL expression expected to contain greater values.

Type:

str

inclusive

If True, validates column <= limit. If False, requires strict inequality (<). Defaults to False.

Type:

bool

check_class

alias of ColumnLessThanCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnPresenceCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, required_columns: list[str])[source]

Bases: BaseAggregateCheckConfig

Configuration schema for column presence validation checks.

Defines the parameters required for configuring checks that enforce required column presence in dataset schemas. This configuration enables declarative check definition through external configuration sources.

required_columns

Column names that must be present in the dataset schema.

Type:

list[str]

check_class

alias of ColumnPresenceCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnsAreCompleteCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ColumnsAreCompleteCheck.

This configuration defines a completeness requirement for multiple columns. The check fails if any of the specified columns contain null values.

columns

List of required columns that must be fully populated.

Type:

List[str]

check_class

alias of ColumnsAreCompleteCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class CompletenessRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for CompletenessRatioCheck.

column

Column name to assess.

Type:

str

min_ratio

Minimum allowed non-null ratio (between 0.0 and 1.0).

Type:

float

check_class

alias of CompletenessRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_threshold() CompletenessRatioCheckConfig[source]

Ensures the min_ratio is between 0 and 1 (inclusive).

class DateBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]

Bases: BaseRowCheckConfig

Configuration schema for date range validation checks.

Defines the parameters and validation rules for configuring checks that enforce date range constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.

columns

Date column names that must fall within the specified range.

Type:

List[str]

min_value

Minimum acceptable date in ISO format (YYYY-MM-DD).

Type:

str

max_value

Maximum acceptable date in ISO format (YYYY-MM-DD).

Type:

str

inclusive

Inclusivity settings for minimum and maximum boundaries respectively.

Type:

tuple[bool, bool]

check_class

alias of DateBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() DateBetweenCheckConfig[source]

Validate the logical consistency of the configured date range parameters.

Ensures that the minimum and maximum date parameters form a valid temporal range and that both values represent valid dates. This validation prevents configuration errors that would result in impossible validation conditions.

Returns:

The validated configuration instance.

Return type:

DateBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – When the date range parameters are logically inconsistent or contain invalid date values.

class DateMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration schema for maximum date validation checks.

Defines the parameters required for configuring checks that enforce maximum date boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.

columns

Date column names that must remain within the maximum threshold.

Type:

List[str]

max_value

Maximum acceptable date in ISO format (YYYY-MM-DD).

Type:

str

inclusive

Whether the maximum date threshold includes the boundary date itself.

Type:

bool

check_class

alias of DateMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DateMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration schema for minimum date validation checks.

Defines the parameters required for configuring checks that enforce minimum date boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.

columns

Date column names that must meet minimum threshold requirements.

Type:

List[str]

min_value

Minimum acceptable date in ISO format (YYYY-MM-DD).

Type:

str

inclusive

Whether the minimum date threshold includes the boundary date itself.

Type:

bool

check_class

alias of DateMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DistinctRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for DistinctRatioCheck.

column

The column to evaluate for distinctness.

Type:

str

min_ratio

Minimum required ratio of distinct values (between 0 and 1).

Type:

float

check_class

alias of DistinctRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ExactlyOneNotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Configuration schema for mutual exclusivity validation checks.

Defines the parameters required for configuring checks that enforce exactly-one constraints among related columns. This configuration enables declarative check definition through external configuration sources.

columns

Column names where exactly one must contain a non-null value.

Type:

List[str]

check_class

alias of ExactlyOneNotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ForeignKeyCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, reference_dataset: str, reference_column: str)[source]

Bases: BaseAggregateCheckConfig

Configuration model for ForeignKeyCheck.

Validates referential integrity by ensuring that all values in column exist in a reference dataset.

check_class

alias of ForeignKeyCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class FreshnessCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, interval: int, period: Literal['year', 'month', 'week', 'day', 'hour', 'minute', 'second'])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the FreshnessCheck.

Ensures that the newest value in the specified timestamp column is recent enough relative to the current time.

column

Name of the timestamp column.

Type:

str

interval

Time window size (must be positive).

Type:

int

period

Unit of time (e.g., “days”, “hours”, “mins”).

Type:

str

check_class

alias of FreshnessCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class IsContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, allowed_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Configuration schema for value whitelist validation checks.

Defines the parameters and validation rules for configuring checks that enforce value containment constraints. The configuration includes logical validation to ensure allowed value specifications are complete and meaningful.

allowed_values

Mapping of column names to their corresponding lists of acceptable values.

Type:

dict[str, list[object]]

check_class

alias of IsContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_allowed_values() IsContainedInCheckConfig[source]

Validate the logical consistency and completeness of the allowed values configuration.

Ensures that the allowed values mapping is properly structured with non-empty value lists for each configured column. This validation prevents configuration errors that would result in impossible or meaningless validation conditions.

Returns:

The validated configuration instance.

Return type:

IsContainedInCheckConfig

Raises:

InvalidCheckConfigurationError – When the allowed values configuration is empty, malformed, or contains invalid value specifications.

class IsNotContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, forbidden_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the IsNotContainedInCheck.

This config allows validation that specified columns do NOT contain forbidden values.

forbidden_values

Mapping of column names to forbidden values.

Type:

dict[str, list[object]]

check_class

alias of IsNotContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_forbidden_values() IsNotContainedInCheckConfig[source]

Validate that forbidden_values is not empty and properly formed.

Returns:

The validated configuration object.

Return type:

IsNotContainedInCheckConfig

Raises:

InvalidCheckConfigurationError – If forbidden_values is missing or invalid.

class NotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Configuration schema for not-null validation checks.

Defines the parameters required for configuring checks that validate columns expected to remain null. This configuration enables declarative check definition through external configuration sources.

columns

Column names that should consistently contain null values.

Type:

List[str]

check_class

alias of NotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Configuration schema for null value validation checks.

Defines the parameters required for configuring checks that enforce non-null value requirements. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.

columns

Column names that must contain non-null values for records to pass validation.

Type:

List[str]

check_class

alias of NullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], inclusive: Tuple[bool, bool] = (False, False), min_value: float | int | Decimal, max_value: float | int | Decimal)[source]

Bases: BaseRowCheckConfig

Configuration schema for numeric range validation checks.

Defines the parameters and validation rules for configuring checks that enforce numeric range constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.

This configuration enables declarative check definition through external configuration sources while ensuring parameter validity at configuration time.

columns

Numeric column names that must fall within the specified range.

Type:

List[str]

min_value

Minimum acceptable numeric value for the valid range.

Type:

float | int | Decimal

max_value

Maximum acceptable numeric value for the valid range.

Type:

float | int | Decimal

inclusive

Inclusivity settings for minimum and maximum boundaries respectively.

Type:

tuple[bool, bool]

check_class

alias of NumericBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() NumericBetweenCheckConfig[source]

Validate the logical consistency of the configured numeric range parameters.

Ensures that the minimum and maximum numeric parameters form a valid range and that both values are properly ordered. This validation prevents configuration errors that would result in impossible validation conditions.

Returns:

The validated configuration instance.

Return type:

NumericBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – When the numeric range parameters are logically inconsistent or contain invalid values.

class NumericMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration schema for maximum numeric threshold validation checks.

Defines the parameters required for configuring checks that enforce maximum numeric boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.

columns

Numeric column names that must remain within the maximum threshold.

Type:

List[str]

max_value

Maximum acceptable numeric value for validation.

Type:

float | int | Decimal

inclusive

Whether the maximum threshold includes the boundary value itself.

Type:

bool

check_class

alias of NumericMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration schema for minimum numeric threshold validation checks.

Defines the parameters required for configuring checks that enforce minimum numeric boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.

columns

Numeric column names that must meet minimum threshold requirements.

Type:

List[str]

min_value

Minimum acceptable numeric value for validation.

Type:

float | int | Decimal

inclusive

Whether the minimum threshold includes the boundary value itself.

Type:

bool

check_class

alias of NumericMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RegexMatchCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, pattern: str, ignore_case: bool = False, treat_null_as_failure: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration schema for regular expression pattern validation checks.

Defines the parameters required for configuring checks that enforce regular expression pattern matching. This configuration enables declarative check definition through external configuration sources while providing flexible matching options.

column

String column name that must conform to the pattern.

Type:

str

pattern

Regular expression pattern for validation.

Type:

str

ignore_case

Whether pattern matching should be case-insensitive.

Type:

bool

treat_null_as_failure

Whether null values should be considered validation failures.

Type:

bool

check_class

alias of RegexMatchCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RowCountBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Configuration schema for row count boundary validation checks.

Defines the parameters and validation rules for configuring checks that enforce dataset size constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.

This configuration enables declarative check definition through external configuration sources while ensuring parameter validity at configuration time.

min_count

Minimum acceptable number of records in the dataset.

Type:

int

max_count

Maximum acceptable number of records in the dataset.

Type:

int

check_class

alias of RowCountBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() RowCountBetweenCheckConfig[source]

Validate the logical consistency of the configured boundary parameters.

Ensures that the minimum and maximum count parameters form a valid range and that both values are non-negative. This validation prevents configuration errors that would result in impossible validation conditions.

Returns:

The validated configuration instance.

Return type:

RowCountBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – When the boundary parameters are logically inconsistent or contain invalid values.

class RowCountExactCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_count: int)[source]

Bases: BaseAggregateCheckConfig

Configuration schema for exact row count validation checks.

Defines the parameters and validation rules for configuring checks that enforce precise row count requirements. The configuration includes logical validation to ensure count parameters are non-negative and meaningful.

expected_count

Exact number of records required in the dataset.

Type:

int

check_class

alias of RowCountExactCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_expected() RowCountExactCheckConfig[source]

Validate the logical consistency of the configured expected count parameter.

Ensures that the expected count parameter is non-negative and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in impossible validation conditions.

Returns:

The validated configuration instance.

Return type:

RowCountExactCheckConfig

Raises:

InvalidCheckConfigurationError – When the expected count parameter is negative, indicating an invalid configuration.

class RowCountMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Configuration schema for maximum row count validation checks.

Defines the parameters and validation rules for configuring checks that enforce maximum row count limits. The configuration includes logical validation to ensure count parameters are positive and meaningful.

max_count

Maximum acceptable number of records in the dataset.

Type:

int

check_class

alias of RowCountMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max() RowCountMaxCheckConfig[source]

Validate the logical consistency of the configured maximum count parameter.

Ensures that the maximum count parameter is positive and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.

Returns:

The validated configuration instance.

Return type:

RowCountMaxCheckConfig

Raises:

InvalidCheckConfigurationError – When the maximum count parameter is zero or negative, indicating an invalid configuration.

class RowCountMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int)[source]

Bases: BaseAggregateCheckConfig

Configuration schema for minimum row count validation checks.

Defines the parameters and validation rules for configuring checks that enforce minimum row count thresholds. The configuration includes logical validation to ensure count parameters are positive and meaningful.

min_count

Minimum acceptable number of records in the dataset.

Type:

int

check_class

alias of RowCountMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min() RowCountMinCheckConfig[source]

Validate the logical consistency of the configured minimum count parameter.

Ensures that the minimum count parameter is positive and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.

Returns:

The validated configuration instance.

Return type:

RowCountMinCheckConfig

Raises:

InvalidCheckConfigurationError – When the minimum count parameter is zero or negative, indicating an invalid configuration.

class SchemaCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_schema: dict[str, str], strict: bool = True)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ExpectedSchemaCheck.

Ensures the DataFrame matches the expected schema, with optional strict mode. Validates all specified types, including support for decimal(p,s) types.

expected_schema

Required column names and Spark types.

Type:

dict[str, str]

strict

Whether to disallow unexpected columns.

Type:

bool

check_class

alias of SchemaCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_schema() SchemaCheckConfig[source]

Validates that expected_schema is not empty and all types are valid.

Raises:

InvalidCheckConfigurationError – If any type is invalid.

class StringLengthBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, max_length: int, inclusive: tuple[bool, bool] = (True, True))[source]

Bases: BaseRowCheckConfig

Configuration for StringLengthBetweenCheck.

Validates that string values in the given column fall between a minimum and maximum length.

column

The string column to validate.

Type:

str

min_length

Minimum valid length (must be > 0).

Type:

int

max_length

Maximum valid length (must be >= min_length).

Type:

int

inclusive

Tuple indicating inclusiveness of min and max bounds.

Type:

tuple[bool, bool]

check_class

alias of StringLengthBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() StringLengthBetweenCheckConfig[source]

Validates that the min/max configuration is logically sound.

Returns:

Validated instance.

Return type:

StringLengthBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – If min_length > max_length or values are invalid.

class StringMaxLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, max_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration schema for maximum string length validation checks.

Defines the parameters and validation rules for configuring checks that enforce maximum string length constraints. The configuration includes logical validation to ensure length parameters are positive and meaningful.

column

String column name that must remain within the maximum length limit.

Type:

str

max_length

Maximum acceptable string length threshold (must be positive).

Type:

int

inclusive

Whether the maximum length threshold includes the boundary value itself.

Type:

bool

check_class

alias of StringMaxLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max_length() StringMaxLengthCheckConfig[source]

Validate the logical consistency of the configured maximum length parameter.

Ensures that the maximum length parameter is positive and meaningful for string validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.

Returns:

The validated configuration instance.

Return type:

StringMaxLengthCheckConfig

Raises:

InvalidCheckConfigurationError – When the maximum length parameter is zero or negative, indicating an invalid configuration.

class StringMinLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration schema for minimum string length validation checks.

Defines the parameters and validation rules for configuring checks that enforce minimum string length requirements. The configuration includes logical validation to ensure length parameters are positive and meaningful.

column

String column name that must meet minimum length requirements.

Type:

str

min_length

Minimum acceptable string length threshold (must be positive).

Type:

int

inclusive

Whether the minimum length threshold includes the boundary value itself.

Type:

bool

check_class

alias of StringMinLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min_length() StringMinLengthCheckConfig[source]

Validate the logical consistency of the configured minimum length parameter.

Ensures that the minimum length parameter is positive and meaningful for string validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.

Returns:

The validated configuration instance.

Return type:

StringMinLengthCheckConfig

Raises:

InvalidCheckConfigurationError – When the minimum length parameter is zero or negative, indicating an invalid configuration.

class TimestampBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]

Bases: BaseRowCheckConfig

Declarative configuration model for TimestampBetweenCheck.

columns

The list of timestamp columns to validate.

Type:

List[str]

min_value

Minimum allowed timestamp.

Type:

str

max_value

Maximum allowed timestamp.

Type:

str

inclusive

Optional tuple of booleans for boundary inclusion.

Type:

tuple

check_class

alias of TimestampBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() TimestampBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Raises:

InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class TimestampMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for TimestampMaxCheck.

columns

The timestamp columns to validate.

Type:

List[str]

max_value

The maximum allowed timestamp in ISO 8601 format.

Type:

str

inclusive

Whether to include the upper bound timestamp.

Type:

bool

check_class

alias of TimestampMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TimestampMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the TimestampMinCheck.

columns

The list of timestamp columns to validate.

Type:

List[str]

min_value

The minimum allowed timestamp in ISO 8601 format (e.g. ‘2023-01-01T00:00:00’).

Type:

str

inclusive

Whether the minimum value is inclusive.

Type:

bool

check_class

alias of TimestampMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the UniqueRatioCheck.

column

The column to check for uniqueness.

Type:

str

min_ratio

The minimum acceptable ratio of distinct values (0.0 - 1.0).

Type:

float

check_class

alias of UniqueRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRowsCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, subset_columns: List[str] | None = None)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for the UniqueRowsCheck.

This check verifies that no duplicate row combinations exist in the dataset. Uniqueness can be enforced across all columns or a selected subset.

subset_columns

List of columns to define uniqueness. If not provided, all columns are used.

Type:

Optional[List[str]]

check_class

alias of UniqueRowsCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].