sparkdq.checks#
This module serves as the dynamic entry point for the sparkdq.checks subpackage.
It recursively traverses all submodules within the sparkdq.checks package to identify and register all classes that inherit from either BaseRowCheckConfig or BaseAggregateCheckConfig.
Each valid check configuration class is added to the module’s global namespace and included in __all__, ensuring it becomes part of the public API.
This approach eliminates the need for manual imports and __all__ maintenance, making the system easy to extend as new check classes are added. It also allows developers and users to import any check configuration directly from the top-level sparkdq namespace.
This mechanism incurs a small runtime overhead during the initial import of sparkdq.checks, but improves maintainability and scalability significantly in large modular frameworks.
- class ColumnGreaterThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the ColumnGreaterThanCheck.
This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly greater than (or greater than or equal to, if inclusive=True) the result of evaluating limit.
Null values in either column or the limit result are treated as invalid and will fail the check.
Example
>>> # Simple column comparison >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="end-after-start", ... column="end_time", ... limit="start_time", ... inclusive=True ... )
>>> # Expression with mathematical operation >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="price-with-margin", ... column="selling_price", ... limit="cost_price * 1.2", ... inclusive=False ... )
>>> # Complex conditional expression >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="score-validation", ... column="user_score", ... limit="CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END", ... inclusive=True ... )
- column
Column expected to contain greater (or equal) values.
- Type:
str
- limit
The column or a Spark SQL expression expected to contain smaller values.
- Type:
str
- inclusive
If True, validates column >= limit. If False, requires strict inequality (>). Defaults to False.
- Type:
bool
- check_class
alias of
ColumnGreaterThanCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnLessThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the ColumnLessThanCheck.
This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly less than (or less than or equal to, if inclusive=True) the result of evaluating limit.
Null values in either column or the limit result are treated as invalid and will fail the check.
Examples
>>> # Simple column comparison >>> cfg = ColumnLessThanCheckConfig( ... check_id="start-before-end", ... column="start_time", ... limit="end_time", ... inclusive=True ... )
>>> # Expression with mathematical operation >>> cfg = ColumnLessThanCheckConfig( ... check_id="price-with-margin", ... column="cost_price", ... limit="selling_price * 0.8", ... inclusive=True ... )
>>> # Complex conditional expression >>> cfg = ColumnLessThanCheckConfig( ... check_id="score-validation", ... column="user_score", ... limit="CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END", ... inclusive=False ... )
- column
Column expected to contain smaller (or equal) values.
- Type:
str
- limit
The column or a Spark SQL expression expected to contain greater values.
- Type:
str
- inclusive
If True, validates column <= limit. If False, requires strict inequality (<). Defaults to False.
- Type:
bool
- check_class
alias of
ColumnLessThanCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnPresenceCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, required_columns: list[str])[source]
Bases:
BaseAggregateCheckConfig
Configuration schema for column presence validation checks.
Defines the parameters required for configuring checks that enforce required column presence in dataset schemas. This configuration enables declarative check definition through external configuration sources.
- required_columns
Column names that must be present in the dataset schema.
- Type:
list[str]
- check_class
alias of
ColumnPresenceCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnsAreCompleteCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the ColumnsAreCompleteCheck.
This configuration defines a completeness requirement for multiple columns. The check fails if any of the specified columns contain null values.
- columns
List of required columns that must be fully populated.
- Type:
List[str]
- check_class
alias of
ColumnsAreCompleteCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class CompletenessRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration for CompletenessRatioCheck.
- column
Column name to assess.
- Type:
str
- min_ratio
Minimum allowed non-null ratio (between 0.0 and 1.0).
- Type:
float
- check_class
alias of
CompletenessRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_threshold() CompletenessRatioCheckConfig [source]
Ensures the min_ratio is between 0 and 1 (inclusive).
- class DateBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]
Bases:
BaseRowCheckConfig
Configuration schema for date range validation checks.
Defines the parameters and validation rules for configuring checks that enforce date range constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.
- columns
Date column names that must fall within the specified range.
- Type:
List[str]
- min_value
Minimum acceptable date in ISO format (YYYY-MM-DD).
- Type:
str
- max_value
Maximum acceptable date in ISO format (YYYY-MM-DD).
- Type:
str
- inclusive
Inclusivity settings for minimum and maximum boundaries respectively.
- Type:
tuple[bool, bool]
- check_class
alias of
DateBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() DateBetweenCheckConfig [source]
Validate the logical consistency of the configured date range parameters.
Ensures that the minimum and maximum date parameters form a valid temporal range and that both values represent valid dates. This validation prevents configuration errors that would result in impossible validation conditions.
- Returns:
The validated configuration instance.
- Return type:
DateBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – When the date range parameters are logically inconsistent or contain invalid date values.
- class DateMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration schema for maximum date validation checks.
Defines the parameters required for configuring checks that enforce maximum date boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.
- columns
Date column names that must remain within the maximum threshold.
- Type:
List[str]
- max_value
Maximum acceptable date in ISO format (YYYY-MM-DD).
- Type:
str
- inclusive
Whether the maximum date threshold includes the boundary date itself.
- Type:
bool
- check_class
alias of
DateMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DateMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration schema for minimum date validation checks.
Defines the parameters required for configuring checks that enforce minimum date boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.
- columns
Date column names that must meet minimum threshold requirements.
- Type:
List[str]
- min_value
Minimum acceptable date in ISO format (YYYY-MM-DD).
- Type:
str
- inclusive
Whether the minimum date threshold includes the boundary date itself.
- Type:
bool
- check_class
alias of
DateMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DistinctRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for DistinctRatioCheck.
- column
The column to evaluate for distinctness.
- Type:
str
- min_ratio
Minimum required ratio of distinct values (between 0 and 1).
- Type:
float
- check_class
alias of
DistinctRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ExactlyOneNotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Configuration schema for mutual exclusivity validation checks.
Defines the parameters required for configuring checks that enforce exactly-one constraints among related columns. This configuration enables declarative check definition through external configuration sources.
- columns
Column names where exactly one must contain a non-null value.
- Type:
List[str]
- check_class
alias of
ExactlyOneNotNullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ForeignKeyCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, reference_dataset: str, reference_column: str)[source]
Bases:
BaseAggregateCheckConfig
Configuration model for ForeignKeyCheck.
Validates referential integrity by ensuring that all values in column exist in a reference dataset.
- check_class
alias of
ForeignKeyCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class FreshnessCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, interval: int, period: Literal['year', 'month', 'week', 'day', 'hour', 'minute', 'second'])[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the FreshnessCheck.
Ensures that the newest value in the specified timestamp column is recent enough relative to the current time.
- column
Name of the timestamp column.
- Type:
str
- interval
Time window size (must be positive).
- Type:
int
- period
Unit of time (e.g., “days”, “hours”, “mins”).
- Type:
str
- check_class
alias of
FreshnessCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class IsContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, allowed_values: dict[str, list[object]])[source]
Bases:
BaseRowCheckConfig
Configuration schema for value whitelist validation checks.
Defines the parameters and validation rules for configuring checks that enforce value containment constraints. The configuration includes logical validation to ensure allowed value specifications are complete and meaningful.
- allowed_values
Mapping of column names to their corresponding lists of acceptable values.
- Type:
dict[str, list[object]]
- check_class
alias of
IsContainedInCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_allowed_values() IsContainedInCheckConfig [source]
Validate the logical consistency and completeness of the allowed values configuration.
Ensures that the allowed values mapping is properly structured with non-empty value lists for each configured column. This validation prevents configuration errors that would result in impossible or meaningless validation conditions.
- Returns:
The validated configuration instance.
- Return type:
IsContainedInCheckConfig
- Raises:
InvalidCheckConfigurationError – When the allowed values configuration is empty, malformed, or contains invalid value specifications.
- class IsNotContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, forbidden_values: dict[str, list[object]])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the IsNotContainedInCheck.
This config allows validation that specified columns do NOT contain forbidden values.
- forbidden_values
Mapping of column names to forbidden values.
- Type:
dict[str, list[object]]
- check_class
alias of
IsNotContainedInCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_forbidden_values() IsNotContainedInCheckConfig [source]
Validate that forbidden_values is not empty and properly formed.
- Returns:
The validated configuration object.
- Return type:
IsNotContainedInCheckConfig
- Raises:
InvalidCheckConfigurationError – If forbidden_values is missing or invalid.
- class NotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Configuration schema for not-null validation checks.
Defines the parameters required for configuring checks that validate columns expected to remain null. This configuration enables declarative check definition through external configuration sources.
- columns
Column names that should consistently contain null values.
- Type:
List[str]
- check_class
alias of
NotNullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Configuration schema for null value validation checks.
Defines the parameters required for configuring checks that enforce non-null value requirements. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.
- columns
Column names that must contain non-null values for records to pass validation.
- Type:
List[str]
- check_class
alias of
NullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NumericBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], inclusive: Tuple[bool, bool] = (False, False), min_value: float | int | Decimal, max_value: float | int | Decimal)[source]
Bases:
BaseRowCheckConfig
Configuration schema for numeric range validation checks.
Defines the parameters and validation rules for configuring checks that enforce numeric range constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.
This configuration enables declarative check definition through external configuration sources while ensuring parameter validity at configuration time.
- columns
Numeric column names that must fall within the specified range.
- Type:
List[str]
- min_value
Minimum acceptable numeric value for the valid range.
- Type:
float | int | Decimal
- max_value
Maximum acceptable numeric value for the valid range.
- Type:
float | int | Decimal
- inclusive
Inclusivity settings for minimum and maximum boundaries respectively.
- Type:
tuple[bool, bool]
- check_class
alias of
NumericBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() NumericBetweenCheckConfig [source]
Validate the logical consistency of the configured numeric range parameters.
Ensures that the minimum and maximum numeric parameters form a valid range and that both values are properly ordered. This validation prevents configuration errors that would result in impossible validation conditions.
- Returns:
The validated configuration instance.
- Return type:
NumericBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – When the numeric range parameters are logically inconsistent or contain invalid values.
- class NumericMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: float | int | Decimal, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration schema for maximum numeric threshold validation checks.
Defines the parameters required for configuring checks that enforce maximum numeric boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.
- columns
Numeric column names that must remain within the maximum threshold.
- Type:
List[str]
- max_value
Maximum acceptable numeric value for validation.
- Type:
float | int | Decimal
- inclusive
Whether the maximum threshold includes the boundary value itself.
- Type:
bool
- check_class
alias of
NumericMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NumericMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: float | int | Decimal, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration schema for minimum numeric threshold validation checks.
Defines the parameters required for configuring checks that enforce minimum numeric boundaries. This configuration enables declarative check definition through external configuration sources while ensuring parameter validity.
- columns
Numeric column names that must meet minimum threshold requirements.
- Type:
List[str]
- min_value
Minimum acceptable numeric value for validation.
- Type:
float | int | Decimal
- inclusive
Whether the minimum threshold includes the boundary value itself.
- Type:
bool
- check_class
alias of
NumericMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RegexMatchCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, pattern: str, ignore_case: bool = False, treat_null_as_failure: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration schema for regular expression pattern validation checks.
Defines the parameters required for configuring checks that enforce regular expression pattern matching. This configuration enables declarative check definition through external configuration sources while providing flexible matching options.
- column
String column name that must conform to the pattern.
- Type:
str
- pattern
Regular expression pattern for validation.
- Type:
str
- ignore_case
Whether pattern matching should be case-insensitive.
- Type:
bool
- treat_null_as_failure
Whether null values should be considered validation failures.
- Type:
bool
- check_class
alias of
RegexMatchCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RowCountBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int, max_count: int)[source]
Bases:
BaseAggregateCheckConfig
Configuration schema for row count boundary validation checks.
Defines the parameters and validation rules for configuring checks that enforce dataset size constraints. The configuration includes logical validation to ensure boundary parameters are consistent and meaningful.
This configuration enables declarative check definition through external configuration sources while ensuring parameter validity at configuration time.
- min_count
Minimum acceptable number of records in the dataset.
- Type:
int
- max_count
Maximum acceptable number of records in the dataset.
- Type:
int
- check_class
alias of
RowCountBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_range() RowCountBetweenCheckConfig [source]
Validate the logical consistency of the configured boundary parameters.
Ensures that the minimum and maximum count parameters form a valid range and that both values are non-negative. This validation prevents configuration errors that would result in impossible validation conditions.
- Returns:
The validated configuration instance.
- Return type:
RowCountBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – When the boundary parameters are logically inconsistent or contain invalid values.
- class RowCountExactCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_count: int)[source]
Bases:
BaseAggregateCheckConfig
Configuration schema for exact row count validation checks.
Defines the parameters and validation rules for configuring checks that enforce precise row count requirements. The configuration includes logical validation to ensure count parameters are non-negative and meaningful.
- expected_count
Exact number of records required in the dataset.
- Type:
int
- check_class
alias of
RowCountExactCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_expected() RowCountExactCheckConfig [source]
Validate the logical consistency of the configured expected count parameter.
Ensures that the expected count parameter is non-negative and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in impossible validation conditions.
- Returns:
The validated configuration instance.
- Return type:
RowCountExactCheckConfig
- Raises:
InvalidCheckConfigurationError – When the expected count parameter is negative, indicating an invalid configuration.
- class RowCountMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, max_count: int)[source]
Bases:
BaseAggregateCheckConfig
Configuration schema for maximum row count validation checks.
Defines the parameters and validation rules for configuring checks that enforce maximum row count limits. The configuration includes logical validation to ensure count parameters are positive and meaningful.
- max_count
Maximum acceptable number of records in the dataset.
- Type:
int
- check_class
alias of
RowCountMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_max() RowCountMaxCheckConfig [source]
Validate the logical consistency of the configured maximum count parameter.
Ensures that the maximum count parameter is positive and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.
- Returns:
The validated configuration instance.
- Return type:
RowCountMaxCheckConfig
- Raises:
InvalidCheckConfigurationError – When the maximum count parameter is zero or negative, indicating an invalid configuration.
- class RowCountMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int)[source]
Bases:
BaseAggregateCheckConfig
Configuration schema for minimum row count validation checks.
Defines the parameters and validation rules for configuring checks that enforce minimum row count thresholds. The configuration includes logical validation to ensure count parameters are positive and meaningful.
- min_count
Minimum acceptable number of records in the dataset.
- Type:
int
- check_class
alias of
RowCountMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_min() RowCountMinCheckConfig [source]
Validate the logical consistency of the configured minimum count parameter.
Ensures that the minimum count parameter is positive and meaningful for dataset validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.
- Returns:
The validated configuration instance.
- Return type:
RowCountMinCheckConfig
- Raises:
InvalidCheckConfigurationError – When the minimum count parameter is zero or negative, indicating an invalid configuration.
- class SchemaCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_schema: dict[str, str], strict: bool = True)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the ExpectedSchemaCheck.
Ensures the DataFrame matches the expected schema, with optional strict mode. Validates all specified types, including support for decimal(p,s) types.
- expected_schema
Required column names and Spark types.
- Type:
dict[str, str]
- strict
Whether to disallow unexpected columns.
- Type:
bool
- check_class
alias of
SchemaCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_schema() SchemaCheckConfig [source]
Validates that expected_schema is not empty and all types are valid.
- Raises:
InvalidCheckConfigurationError – If any type is invalid.
- class StringLengthBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, max_length: int, inclusive: tuple[bool, bool] = (True, True))[source]
Bases:
BaseRowCheckConfig
Configuration for StringLengthBetweenCheck.
Validates that string values in the given column fall between a minimum and maximum length.
- column
The string column to validate.
- Type:
str
- min_length
Minimum valid length (must be > 0).
- Type:
int
- max_length
Maximum valid length (must be >= min_length).
- Type:
int
- inclusive
Tuple indicating inclusiveness of min and max bounds.
- Type:
tuple[bool, bool]
- check_class
alias of
StringLengthBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_range() StringLengthBetweenCheckConfig [source]
Validates that the min/max configuration is logically sound.
- Returns:
Validated instance.
- Return type:
StringLengthBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – If min_length > max_length or values are invalid.
- class StringMaxLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, max_length: int, inclusive: bool = True)[source]
Bases:
BaseRowCheckConfig
Configuration schema for maximum string length validation checks.
Defines the parameters and validation rules for configuring checks that enforce maximum string length constraints. The configuration includes logical validation to ensure length parameters are positive and meaningful.
- column
String column name that must remain within the maximum length limit.
- Type:
str
- max_length
Maximum acceptable string length threshold (must be positive).
- Type:
int
- inclusive
Whether the maximum length threshold includes the boundary value itself.
- Type:
bool
- check_class
alias of
StringMaxLengthCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_max_length() StringMaxLengthCheckConfig [source]
Validate the logical consistency of the configured maximum length parameter.
Ensures that the maximum length parameter is positive and meaningful for string validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.
- Returns:
The validated configuration instance.
- Return type:
StringMaxLengthCheckConfig
- Raises:
InvalidCheckConfigurationError – When the maximum length parameter is zero or negative, indicating an invalid configuration.
- class StringMinLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, inclusive: bool = True)[source]
Bases:
BaseRowCheckConfig
Configuration schema for minimum string length validation checks.
Defines the parameters and validation rules for configuring checks that enforce minimum string length requirements. The configuration includes logical validation to ensure length parameters are positive and meaningful.
- column
String column name that must meet minimum length requirements.
- Type:
str
- min_length
Minimum acceptable string length threshold (must be positive).
- Type:
int
- inclusive
Whether the minimum length threshold includes the boundary value itself.
- Type:
bool
- check_class
alias of
StringMinLengthCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_min_length() StringMinLengthCheckConfig [source]
Validate the logical consistency of the configured minimum length parameter.
Ensures that the minimum length parameter is positive and meaningful for string validation purposes. This validation prevents configuration errors that would result in nonsensical validation conditions.
- Returns:
The validated configuration instance.
- Return type:
StringMinLengthCheckConfig
- Raises:
InvalidCheckConfigurationError – When the minimum length parameter is zero or negative, indicating an invalid configuration.
- class TimestampBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for TimestampBetweenCheck.
- columns
The list of timestamp columns to validate.
- Type:
List[str]
- min_value
Minimum allowed timestamp.
- Type:
str
- max_value
Maximum allowed timestamp.
- Type:
str
- inclusive
Optional tuple of booleans for boundary inclusion.
- Type:
tuple
- check_class
alias of
TimestampBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() TimestampBetweenCheckConfig [source]
Validates that
min_value
andmax_value
are properly configured and thatmin_value
is not greater thanmax_value
.- Raises:
InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.
- class TimestampMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for TimestampMaxCheck.
- columns
The timestamp columns to validate.
- Type:
List[str]
- max_value
The maximum allowed timestamp in ISO 8601 format.
- Type:
str
- inclusive
Whether to include the upper bound timestamp.
- Type:
bool
- check_class
alias of
TimestampMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class TimestampMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the TimestampMinCheck.
- columns
The list of timestamp columns to validate.
- Type:
List[str]
- min_value
The minimum allowed timestamp in ISO 8601 format (e.g. ‘2023-01-01T00:00:00’).
- Type:
str
- inclusive
Whether the minimum value is inclusive.
- Type:
bool
- check_class
alias of
TimestampMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class UniqueRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the UniqueRatioCheck.
- column
The column to check for uniqueness.
- Type:
str
- min_ratio
The minimum acceptable ratio of distinct values (0.0 - 1.0).
- Type:
float
- check_class
alias of
UniqueRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class UniqueRowsCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, subset_columns: List[str] | None = None)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration for the UniqueRowsCheck.
This check verifies that no duplicate row combinations exist in the dataset. Uniqueness can be enforced across all columns or a selected subset.
- subset_columns
List of columns to define uniqueness. If not provided, all columns are used.
- Type:
Optional[List[str]]
- check_class
alias of
UniqueRowsCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].