sparkdq.checks#
This module serves as the dynamic entry point for the sparkdq.checks subpackage.
It recursively traverses all submodules within the sparkdq.checks package to identify and register all classes that inherit from either BaseRowCheckConfig or BaseAggregateCheckConfig.
Each valid check configuration class is added to the module’s global namespace and included in __all__, ensuring it becomes part of the public API.
This approach eliminates the need for manual imports and __all__ maintenance, making the system easy to extend as new check classes are added. It also allows developers and users to import any check configuration directly from the top-level sparkdq namespace.
This mechanism incurs a small runtime overhead during the initial import of sparkdq.checks, but improves maintainability and scalability significantly in large modular frameworks.
- class ColumnGreaterThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the ColumnGreaterThanCheck.
This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly greater than (or greater than or equal to, if inclusive=True) the result of evaluating limit.
Null values in either column or the limit result are treated as invalid and will fail the check.
Example
>>> # Simple column comparison >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="end-after-start", ... column="end_time", ... limit="start_time", ... inclusive=True ... )
>>> # Expression with mathematical operation >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="price-with-margin", ... column="selling_price", ... limit="cost_price * 1.2", ... inclusive=False ... )
>>> # Complex conditional expression >>> cfg = ColumnGreaterThanCheckConfig( ... check_id="score-validation", ... column="user_score", ... limit="CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END", ... inclusive=True ... )
- column
Column expected to contain greater (or equal) values.
- Type:
str
- limit
The column or a Spark SQL expression expected to contain smaller values.
- Type:
str
- inclusive
If True, validates column >= limit. If False, requires strict inequality (>). Defaults to False.
- Type:
bool
- check_class
alias of
ColumnGreaterThanCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnLessThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the ColumnLessThanCheck.
This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly less than (or less than or equal to, if inclusive=True) the result of evaluating limit.
Null values in either column or the limit result are treated as invalid and will fail the check.
Examples
>>> # Simple column comparison >>> cfg = ColumnLessThanCheckConfig( ... check_id="start-before-end", ... column="start_time", ... limit="end_time", ... inclusive=True ... )
>>> # Expression with mathematical operation >>> cfg = ColumnLessThanCheckConfig( ... check_id="price-with-margin", ... column="cost_price", ... limit="selling_price * 0.8", ... inclusive=True ... )
>>> # Complex conditional expression >>> cfg = ColumnLessThanCheckConfig( ... check_id="score-validation", ... column="user_score", ... limit="CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END", ... inclusive=False ... )
- column
Column expected to contain smaller (or equal) values.
- Type:
str
- limit
The column or a Spark SQL expression expected to contain greater values.
- Type:
str
- inclusive
If True, validates column <= limit. If False, requires strict inequality (<). Defaults to False.
- Type:
bool
- check_class
alias of
ColumnLessThanCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnPresenceCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, required_columns: list[str])[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the ColumnPresenceCheck.
This config defines a set of required column names that must exist in the DataFrame.
- required_columns
The list of required column names.
- Type:
list[str]
- check_class
alias of
ColumnPresenceCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ColumnsAreCompleteCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the ColumnsAreCompleteCheck.
This configuration defines a completeness requirement for multiple columns. The check fails if any of the specified columns contain null values.
- columns
List of required columns that must be fully populated.
- Type:
List[str]
- check_class
alias of
ColumnsAreCompleteCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class CompletenessRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration for CompletenessRatioCheck.
- column
Column name to assess.
- Type:
str
- min_ratio
Minimum allowed non-null ratio (between 0.0 and 1.0).
- Type:
float
- check_class
alias of
CompletenessRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_threshold() CompletenessRatioCheckConfig [source]
Ensures the min_ratio is between 0 and 1 (inclusive).
- class DateBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for DateBetweenCheck.
- columns
Date columns to validate.
- Type:
List[str]
- min_value
Minimum allowed date in ‘YYYY-MM-DD’ format.
- Type:
str
- max_value
Maximum allowed date in ‘YYYY-MM-DD’ format.
- Type:
str
- inclusive
Inclusion flags for min and max boundaries.
- Type:
tuple[bool, bool]
- check_class
alias of
DateBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() DateBetweenCheckConfig [source]
Validates that min_value and max_value are properly configured and that
min_value
is not greater thanmax_value
.- Returns:
The validated configuration object.
- Return type:
DateBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.
- class DateMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for DateMaxCheck.
- columns
Date columns to validate.
- Type:
List[str]
- max_value
The maximum allowed date in ISO format.
- Type:
str
- inclusive
Whether to include the maximum date.
- Type:
bool
- check_class
alias of
DateMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DateMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the DateMinCheck.
- columns
The list of date columns to validate.
- Type:
List[str]
- min_value
The minimum allowed date (inclusive), in ‘YYYY-MM-DD’ format.
- Type:
str
- inclusive
Whether to include the minimum date.
- Type:
bool
- check_class
alias of
DateMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class DistinctRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for DistinctRatioCheck.
- column
The column to evaluate for distinctness.
- Type:
str
- min_ratio
Minimum required ratio of distinct values (between 0 and 1).
- Type:
float
- check_class
alias of
DistinctRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ExactlyOneNotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the ExactlyOneNotNullCheck.
- columns
The names of the columns where exactly one must be non-null per row.
- Type:
List[str]
- check_class
alias of
ExactlyOneNotNullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class ForeignKeyCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, reference_dataset: str, reference_column: str)[source]
Bases:
BaseAggregateCheckConfig
Configuration model for ForeignKeyCheck.
Validates referential integrity by ensuring that all values in column exist in a reference dataset.
- check_class
alias of
ForeignKeyCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class FreshnessCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, interval: int, period: Literal['year', 'month', 'week', 'day', 'hour', 'minute', 'second'])[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the FreshnessCheck.
Ensures that the newest value in the specified timestamp column is recent enough relative to the current time.
- column
Name of the timestamp column.
- Type:
str
- interval
Time window size (must be positive).
- Type:
int
- period
Unit of time (e.g., “days”, “hours”, “mins”).
- Type:
str
- check_class
alias of
FreshnessCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class IsContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, allowed_values: dict[str, list[object]])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the IsContainedInCheck.
This config allows validation that specified columns contain only predefined values.
- allowed_values
Mapping of column names to allowed values.
- Type:
dict[str, list[object]]
- check_class
alias of
IsContainedInCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_allowed_values() IsContainedInCheckConfig [source]
Validate that allowed_values is not empty and properly formed.
- Returns:
The validated configuration object.
- Return type:
IsContainedInCheckConfig
- Raises:
InvalidCheckConfigurationError – If allowed_values is invalid.
- class IsNotContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, forbidden_values: dict[str, list[object]])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the IsNotContainedInCheck.
This config allows validation that specified columns do NOT contain forbidden values.
- forbidden_values
Mapping of column names to forbidden values.
- Type:
dict[str, list[object]]
- check_class
alias of
IsNotContainedInCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_forbidden_values() IsNotContainedInCheckConfig [source]
Validate that forbidden_values is not empty and properly formed.
- Returns:
The validated configuration object.
- Return type:
IsNotContainedInCheckConfig
- Raises:
InvalidCheckConfigurationError – If forbidden_values is missing or invalid.
- class NotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the NotNullCheck.
- columns
The names of the columns that should remain null.
- Type:
List[str]
- check_class
alias of
NotNullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the NullCheck.
- columns
The names of the columns to check for null values. This is a required field and must match existing columns in the DataFrame.
- Type:
List[str]
- check_class
alias of
NullCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NumericBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], inclusive: Tuple[bool, bool] = (False, False), min_value: float | int | Decimal, max_value: float | int | Decimal)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the NumericBetweenCheck.
This configuration defines both a lower and upper bound constraint on one or more numeric columns. It ensures that all specified columns contain only values between the configured min_value and max_value. Violations are flagged per row.
- columns
The list of numeric columns to validate.
- Type:
List[str]
- min_value
The minimum allowed value (inclusive).
- Type:
float | int | Decimal
- max_value
The maximum allowed value (inclusive).
- Type:
float | int | Decimal
- inclusive
Inclusion flags for min and max boundaries.
- Type:
tuple[bool, bool]
- check_class
alias of
NumericBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() NumericBetweenCheckConfig [source]
Validates that
min_value
andmax_value
are properly configured and thatmin_value
is not greater thanmax_value
.- Returns:
The validated configuration object.
- Return type:
NumericBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.
- class NumericMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: float | int | Decimal, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the NumericMaxCheck.
- columns
The list of numeric columns to validate.
- Type:
List[str]
- max_value
The maximum allowed value (inclusive).
- Type:
float | int | Decimal
- inclusive
Whether to include the maximum value.
- Type:
bool
- check_class
alias of
NumericMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class NumericMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: float | int | Decimal, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the NumericMinCheck.
- columns
The list of numeric columns to validate.
- Type:
List[str]
- min_value
The minimum allowed value (inclusive).
- Type:
float
- inclusive
Whether to include the minimum value.
- Type:
bool
- check_class
alias of
NumericMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RegexMatchCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, pattern: str, ignore_case: bool = False, treat_null_as_failure: bool = False)[source]
Bases:
BaseRowCheckConfig
Configuration for RegexMatchCheck.
Validates that a string column matches a given regex pattern.
- column
Column to validate.
- Type:
str
- pattern
Regex pattern to use for matching.
- Type:
str
- ignore_case
If True, regex is case-insensitive (default: False).
- Type:
bool
- treat_null_as_failure
If True, null values are marked as failed (default: False).
- Type:
bool
- check_class
alias of
RegexMatchCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class RowCountBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int, max_count: int)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the RowCountBetweenCheck.
This config is used to define acceptable row count bounds in a data validation pipeline. It ensures that: - both min_count and max_count are provided, - and that min_count <= max_count.
It is typically used when defining checks via JSON, YAML, or dict-based configs.
- min_count
Minimum number of rows expected in the dataset.
- Type:
int
- max_count
Maximum number of rows allowed in the dataset.
- Type:
int
- check_class
alias of
RowCountBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_range() RowCountBetweenCheckConfig [source]
Validate the logical consistency of the configured bounds.
This method ensures that
min_count
is not greater thanmax_count
. If violated, a configuration-level exception is raised immediately to prevent runtime failures.- Returns:
The validated configuration object.
- Return type:
RowCountBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – If
min_count > max_count
ormin_count
< 0.
- class RowCountExactCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_count: int)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the RowCountExactCheck.
This configuration defines an exact row count requirement for a dataset. It ensures that the
expected_count
parameter is provided and is non-negative.- expected_count
The exact number of rows expected in the dataset.
- Type:
int
- check_class
alias of
RowCountExactCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_expected() RowCountExactCheckConfig [source]
Validate that the configured expected_count is greater than 0.
- Returns:
The validated configuration object.
- Return type:
RowCountExactCheckConfig
- Raises:
InvalidCheckConfigurationError – If
expected_count
is negative.
- class RowCountMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, max_count: int)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the RowCountMaxCheck.
This configuration defines a maximum row count requirement for a dataset. It ensures that the
max_count
parameter is provided and has a positive value.- max_count
Maximum number of rows allowed in the dataset.
- Type:
int
- check_class
alias of
RowCountMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_max() RowCountMaxCheckConfig [source]
Validate that the configured
max_count
is greater than 0.- Returns:
The validated configuration object.
- Return type:
RowCountMaxCheckConfig
- Raises:
InvalidCheckConfigurationError – If
max_count
is not greater than 0.
- class RowCountMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the RowCountMinCheck.
This configuration defines a minimum row count requirement for a dataset. It ensures that the
min_count
parameter is provided and is non-negative.- min_count
Minimum number of rows expected in the dataset.
- Type:
int
- check_class
alias of
RowCountMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_min() RowCountMinCheckConfig [source]
Validate that the configured
min_count
is greater than 0.- Returns:
The validated configuration object.
- Return type:
RowCountMinCheckConfig
- Raises:
InvalidCheckConfigurationError – If
min_count
is negative.
- class SchemaCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_schema: dict[str, str], strict: bool = True)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the ExpectedSchemaCheck.
Ensures the DataFrame matches the expected schema, with optional strict mode. Validates all specified types, including support for decimal(p,s) types.
- expected_schema
Required column names and Spark types.
- Type:
dict[str, str]
- strict
Whether to disallow unexpected columns.
- Type:
bool
- check_class
alias of
SchemaCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_schema() SchemaCheckConfig [source]
Validates that expected_schema is not empty and all types are valid.
- Raises:
InvalidCheckConfigurationError – If any type is invalid.
- class StringLengthBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, max_length: int, inclusive: tuple[bool, bool] = (True, True))[source]
Bases:
BaseRowCheckConfig
Configuration for StringLengthBetweenCheck.
Validates that string values in the given column fall between a minimum and maximum length.
- column
The string column to validate.
- Type:
str
- min_length
Minimum valid length (must be > 0).
- Type:
int
- max_length
Maximum valid length (must be >= min_length).
- Type:
int
- inclusive
Tuple indicating inclusiveness of min and max bounds.
- Type:
tuple[bool, bool]
- check_class
alias of
StringLengthBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_range() StringLengthBetweenCheckConfig [source]
Validates that the min/max configuration is logically sound.
- Returns:
Validated instance.
- Return type:
StringLengthBetweenCheckConfig
- Raises:
InvalidCheckConfigurationError – If min_length > max_length or values are invalid.
- class StringMaxLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, max_length: int, inclusive: bool = True)[source]
Bases:
BaseRowCheckConfig
Configuration for StringMaxLengthCheck.
Ensures that string values do not exceed the specified maximum length.
- column
Column to validate.
- Type:
str
- max_length
Maximum allowed length (must be > 0).
- Type:
int
- inclusive
If True, length must be <= max_length. If False, strictly < max_length.
- Type:
bool
- check_class
alias of
StringMaxLengthCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_max_length() StringMaxLengthCheckConfig [source]
Validate that max_length is greater than 0.
- Returns:
The validated object.
- Return type:
StringMaxLengthCheckConfig
- Raises:
InvalidCheckConfigurationError – If max_length <= 0.
- class StringMinLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, inclusive: bool = True)[source]
Bases:
BaseRowCheckConfig
Configuration for StringMinLengthCheck.
Ensures that all non-null values in the specified column have a minimum length.
- column
Name of the string column to validate.
- Type:
str
- min_length
Minimum allowed string length (must be > 0).
- Type:
int
- inclusive
If True, allows equality (>=). If False, requires strictly greater length (>).
- Type:
bool
- check_class
alias of
StringMinLengthCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_min_length() StringMinLengthCheckConfig [source]
Validate that the configured min_length is greater than 0.
- Returns:
The validated configuration object.
- Return type:
StringMinLengthCheckConfig
- Raises:
InvalidCheckConfigurationError – If min_length is not greater than 0.
- class TimestampBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for TimestampBetweenCheck.
- columns
The list of timestamp columns to validate.
- Type:
List[str]
- min_value
Minimum allowed timestamp.
- Type:
str
- max_value
Maximum allowed timestamp.
- Type:
str
- inclusive
Optional tuple of booleans for boundary inclusion.
- Type:
tuple
- check_class
alias of
TimestampBetweenCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- validate_between_values() TimestampBetweenCheckConfig [source]
Validates that
min_value
andmax_value
are properly configured and thatmin_value
is not greater thanmax_value
.- Raises:
InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.
- class TimestampMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for TimestampMaxCheck.
- columns
The timestamp columns to validate.
- Type:
List[str]
- max_value
The maximum allowed timestamp in ISO 8601 format.
- Type:
str
- inclusive
Whether to include the upper bound timestamp.
- Type:
bool
- check_class
alias of
TimestampMaxCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class TimestampMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]
Bases:
BaseRowCheckConfig
Declarative configuration model for the TimestampMinCheck.
- columns
The list of timestamp columns to validate.
- Type:
List[str]
- min_value
The minimum allowed timestamp in ISO 8601 format (e.g. ‘2023-01-01T00:00:00’).
- Type:
str
- inclusive
Whether the minimum value is inclusive.
- Type:
bool
- check_class
alias of
TimestampMinCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class UniqueRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration model for the UniqueRatioCheck.
- column
The column to check for uniqueness.
- Type:
str
- min_ratio
The minimum acceptable ratio of distinct values (0.0 - 1.0).
- Type:
float
- check_class
alias of
UniqueRatioCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class UniqueRowsCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, subset_columns: List[str] | None = None)[source]
Bases:
BaseAggregateCheckConfig
Declarative configuration for the UniqueRowsCheck.
This check verifies that no duplicate row combinations exist in the dataset. Uniqueness can be enforced across all columns or a selected subset.
- subset_columns
List of columns to define uniqueness. If not provided, all columns are used.
- Type:
Optional[List[str]]
- check_class
alias of
UniqueRowsCheck
- model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].