sparkdq.checks#

This module serves as the dynamic entry point for the sparkdq.checks subpackage.

It recursively traverses all submodules within the sparkdq.checks package to identify and register all classes that inherit from either BaseRowCheckConfig or BaseAggregateCheckConfig.

Each valid check configuration class is added to the module’s global namespace and included in __all__, ensuring it becomes part of the public API.

This approach eliminates the need for manual imports and __all__ maintenance, making the system easy to extend as new check classes are added. It also allows developers and users to import any check configuration directly from the top-level sparkdq namespace.

This mechanism incurs a small runtime overhead during the initial import of sparkdq.checks, but improves maintainability and scalability significantly in large modular frameworks.

class ColumnGreaterThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ColumnGreaterThanCheck.

This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly greater than (or greater than or equal to, if inclusive=True) the result of evaluating limit.

Null values in either column or the limit result are treated as invalid and will fail the check.

Example

>>> # Simple column comparison
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="end-after-start",
...     column="end_time",
...     limit="start_time",
...     inclusive=True
... )

>>> # Expression with mathematical operation
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="price-with-margin",
...     column="selling_price",
...     limit="cost_price * 1.2",
...     inclusive=False
... )

>>> # Complex conditional expression
>>> cfg = ColumnGreaterThanCheckConfig(
...     check_id="score-validation",
...     column="user_score",
...     limit="CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END",
...     inclusive=True
... )

column

Column expected to contain greater (or equal) values.

Type:: str

limit

The column or a Spark SQL expression expected to contain smaller values.

Type:: str

inclusive

If True, validates column >= limit. If False, requires strict inequality (>). Defaults to False.

Type:: bool

check_class: alias of ColumnGreaterThanCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnLessThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, limit: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ColumnLessThanCheck.

This config defines a row-level comparison between a column and a Spark SQL expression, ensuring that values in column are strictly less than (or less than or equal to, if inclusive=True) the result of evaluating limit.

Null values in either column or the limit result are treated as invalid and will fail the check.

Examples

>>> # Simple column comparison
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="start-before-end",
...     column="start_time",
...     limit="end_time",
...     inclusive=True
... )

>>> # Expression with mathematical operation
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="price-with-margin",
...     column="cost_price",
...     limit="selling_price * 0.8",
...     inclusive=True
... )

>>> # Complex conditional expression
>>> cfg = ColumnLessThanCheckConfig(
...     check_id="score-validation",
...     column="user_score",
...     limit="CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END",
...     inclusive=False
... )

column

Column expected to contain smaller (or equal) values.

Type:: str

limit

The column or a Spark SQL expression expected to contain greater values.

Type:: str

inclusive

If True, validates column <= limit. If False, requires strict inequality (<). Defaults to False.

Type:: bool

check_class: alias of ColumnLessThanCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnPresenceCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, required_columns: list[str])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ColumnPresenceCheck.

This config defines a set of required column names that must exist in the DataFrame.

required_columns

The list of required column names.

Type:: list[str]

check_class: alias of ColumnPresenceCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnsAreCompleteCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ColumnsAreCompleteCheck.

This configuration defines a completeness requirement for multiple columns. The check fails if any of the specified columns contain null values.

columns

List of required columns that must be fully populated.

Type:: List[str]

check_class: alias of ColumnsAreCompleteCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class CompletenessRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for CompletenessRatioCheck.

column

Column name to assess.

Type:: str

min_ratio

Minimum allowed non-null ratio (between 0.0 and 1.0).

Type:: float

check_class: alias of CompletenessRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_threshold() → CompletenessRatioCheckConfig[source]: Ensures the min_ratio is between 0 and 1 (inclusive).

class DateBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]

Bases: BaseRowCheckConfig

Declarative configuration model for DateBetweenCheck.

columns

Date columns to validate.

Type:: List[str]

min_value

Minimum allowed date in ‘YYYY-MM-DD’ format.

Type:: str

max_value

Maximum allowed date in ‘YYYY-MM-DD’ format.

Type:: str

inclusive

Inclusion flags for min and max boundaries.

Type:: tuple[bool, bool]

check_class: alias of DateBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() → DateBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Returns:: The validated configuration object.
Return type:: DateBetweenCheckConfig
Raises:: InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class DateMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for DateMaxCheck.

columns

Date columns to validate.

Type:: List[str]

max_value

The maximum allowed date in ISO format.

Type:: str

inclusive

Whether to include the maximum date.

Type:: bool

check_class: alias of DateMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DateMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the DateMinCheck.

columns

The list of date columns to validate.

Type:: List[str]

min_value

The minimum allowed date (inclusive), in ‘YYYY-MM-DD’ format.

Type:: str

inclusive

Whether to include the minimum date.

Type:: bool

check_class: alias of DateMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DistinctRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for DistinctRatioCheck.

column

The column to evaluate for distinctness.

Type:: str

min_ratio

Minimum required ratio of distinct values (between 0 and 1).

Type:: float

check_class: alias of DistinctRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ExactlyOneNotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ExactlyOneNotNullCheck.

columns

The names of the columns where exactly one must be non-null per row.

Type:: List[str]

check_class: alias of ExactlyOneNotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ForeignKeyCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, reference_dataset: str, reference_column: str)[source]

Bases: BaseAggregateCheckConfig

Configuration model for ForeignKeyCheck.

Validates referential integrity by ensuring that all values in column exist in a reference dataset.

check_class: alias of ForeignKeyCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class FreshnessCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, interval: int, period: Literal['year', 'month', 'week', 'day', 'hour', 'minute', 'second'])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the FreshnessCheck.

Ensures that the newest value in the specified timestamp column is recent enough relative to the current time.

column

Name of the timestamp column.

Type:: str

interval

Time window size (must be positive).

Type:: int

period

Unit of time (e.g., “days”, “hours”, “mins”).

Type:: str

check_class: alias of FreshnessCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class IsContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, allowed_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the IsContainedInCheck.

This config allows validation that specified columns contain only predefined values.

allowed_values

Mapping of column names to allowed values.

Type:: dict[str, list[object]]

check_class: alias of IsContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_allowed_values() → IsContainedInCheckConfig[source]

Validate that allowed_values is not empty and properly formed.

Returns:: The validated configuration object.
Return type:: IsContainedInCheckConfig
Raises:: InvalidCheckConfigurationError – If allowed_values is invalid.

class IsNotContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, forbidden_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the IsNotContainedInCheck.

This config allows validation that specified columns do NOT contain forbidden values.

forbidden_values

Mapping of column names to forbidden values.

Type:: dict[str, list[object]]

check_class: alias of IsNotContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_forbidden_values() → IsNotContainedInCheckConfig[source]

Validate that forbidden_values is not empty and properly formed.

Returns:: The validated configuration object.
Return type:: IsNotContainedInCheckConfig
Raises:: InvalidCheckConfigurationError – If forbidden_values is missing or invalid.

class NotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NotNullCheck.

columns

The names of the columns that should remain null.

Type:: List[str]

check_class: alias of NotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NullCheck.

columns

The names of the columns to check for null values. This is a required field and must match existing columns in the DataFrame.

Type:: List[str]

check_class: alias of NullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], inclusive: Tuple[bool, bool] = (False, False), min_value: float | int | Decimal, max_value: float | int | Decimal)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericBetweenCheck.

This configuration defines both a lower and upper bound constraint on one or more numeric columns. It ensures that all specified columns contain only values between the configured min_value and max_value. Violations are flagged per row.

columns

The list of numeric columns to validate.

Type:: List[str]

min_value

The minimum allowed value (inclusive).

Type:: float | int | Decimal

max_value

The maximum allowed value (inclusive).

Type:: float | int | Decimal

inclusive

Inclusion flags for min and max boundaries.

Type:: tuple[bool, bool]

check_class: alias of NumericBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() → NumericBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Returns:: The validated configuration object.
Return type:: NumericBetweenCheckConfig
Raises:: InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class NumericMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericMaxCheck.

columns

The list of numeric columns to validate.

Type:: List[str]

max_value

The maximum allowed value (inclusive).

Type:: float | int | Decimal

inclusive

Whether to include the maximum value.

Type:: bool

check_class: alias of NumericMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericMinCheck.

columns

The list of numeric columns to validate.

Type:: List[str]

min_value

The minimum allowed value (inclusive).

Type:: float

inclusive

Whether to include the minimum value.

Type:: bool

check_class: alias of NumericMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RegexMatchCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, pattern: str, ignore_case: bool = False, treat_null_as_failure: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration for RegexMatchCheck.

Validates that a string column matches a given regex pattern.

column

Column to validate.

Type:: str

pattern

Regex pattern to use for matching.

Type:: str

ignore_case

If True, regex is case-insensitive (default: False).

Type:: bool

treat_null_as_failure

If True, null values are marked as failed (default: False).

Type:: bool

check_class: alias of RegexMatchCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RowCountBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountBetweenCheck.

This config is used to define acceptable row count bounds in a data validation pipeline. It ensures that: - both min_count and max_count are provided, - and that min_count <= max_count.

It is typically used when defining checks via JSON, YAML, or dict-based configs.

min_count

Minimum number of rows expected in the dataset.

Type:: int

max_count

Maximum number of rows allowed in the dataset.

Type:: int

check_class: alias of RowCountBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() → RowCountBetweenCheckConfig[source]

Validate the logical consistency of the configured bounds.

This method ensures that min_count is not greater than max_count. If violated, a configuration-level exception is raised immediately to prevent runtime failures.

Returns:: The validated configuration object.
Return type:: RowCountBetweenCheckConfig
Raises:: InvalidCheckConfigurationError – If min_count > max_count or min_count < 0.

class RowCountExactCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountExactCheck.

This configuration defines an exact row count requirement for a dataset. It ensures that the expected_count parameter is provided and is non-negative.

expected_count

The exact number of rows expected in the dataset.

Type:: int

check_class: alias of RowCountExactCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_expected() → RowCountExactCheckConfig[source]

Validate that the configured expected_count is greater than 0.

Returns:: The validated configuration object.
Return type:: RowCountExactCheckConfig
Raises:: InvalidCheckConfigurationError – If expected_count is negative.

class RowCountMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountMaxCheck.

This configuration defines a maximum row count requirement for a dataset. It ensures that the max_count parameter is provided and has a positive value.

max_count

Maximum number of rows allowed in the dataset.

Type:: int

check_class: alias of RowCountMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max() → RowCountMaxCheckConfig[source]

Validate that the configured max_count is greater than 0.

Returns:: The validated configuration object.
Return type:: RowCountMaxCheckConfig
Raises:: InvalidCheckConfigurationError – If max_count is not greater than 0.

class RowCountMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountMinCheck.

This configuration defines a minimum row count requirement for a dataset. It ensures that the min_count parameter is provided and is non-negative.

min_count

Minimum number of rows expected in the dataset.

Type:: int

check_class: alias of RowCountMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min() → RowCountMinCheckConfig[source]

Validate that the configured min_count is greater than 0.

Returns:: The validated configuration object.
Return type:: RowCountMinCheckConfig
Raises:: InvalidCheckConfigurationError – If min_count is negative.

class SchemaCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_schema: dict[str, str], strict: bool = True)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ExpectedSchemaCheck.

Ensures the DataFrame matches the expected schema, with optional strict mode. Validates all specified types, including support for decimal(p,s) types.

expected_schema

Required column names and Spark types.

Type:: dict[str, str]

strict

Whether to disallow unexpected columns.

Type:: bool

check_class: alias of SchemaCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_schema() → SchemaCheckConfig[source]

Validates that expected_schema is not empty and all types are valid.

Raises:: InvalidCheckConfigurationError – If any type is invalid.

class StringLengthBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, max_length: int, inclusive: tuple[bool, bool] = (True, True))[source]

Bases: BaseRowCheckConfig

Configuration for StringLengthBetweenCheck.

Validates that string values in the given column fall between a minimum and maximum length.

column

The string column to validate.

Type:: str

min_length

Minimum valid length (must be > 0).

Type:: int

max_length

Maximum valid length (must be >= min_length).

Type:: int

inclusive

Tuple indicating inclusiveness of min and max bounds.

Type:: tuple[bool, bool]

check_class: alias of StringLengthBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() → StringLengthBetweenCheckConfig[source]

Validates that the min/max configuration is logically sound.

Returns:: Validated instance.
Return type:: StringLengthBetweenCheckConfig
Raises:: InvalidCheckConfigurationError – If min_length > max_length or values are invalid.

class StringMaxLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, max_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration for StringMaxLengthCheck.

Ensures that string values do not exceed the specified maximum length.

column

Column to validate.

Type:: str

max_length

Maximum allowed length (must be > 0).

Type:: int

inclusive

If True, length must be <= max_length. If False, strictly < max_length.

Type:: bool

check_class: alias of StringMaxLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max_length() → StringMaxLengthCheckConfig[source]

Validate that max_length is greater than 0.

Returns:: The validated object.
Return type:: StringMaxLengthCheckConfig
Raises:: InvalidCheckConfigurationError – If max_length <= 0.

class StringMinLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration for StringMinLengthCheck.

Ensures that all non-null values in the specified column have a minimum length.

column

Name of the string column to validate.

Type:: str

min_length

Minimum allowed string length (must be > 0).

Type:: int

inclusive

If True, allows equality (>=). If False, requires strictly greater length (>).

Type:: bool

check_class: alias of StringMinLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min_length() → StringMinLengthCheckConfig[source]

Validate that the configured min_length is greater than 0.

Returns:: The validated configuration object.
Return type:: StringMinLengthCheckConfig
Raises:: InvalidCheckConfigurationError – If min_length is not greater than 0.

class TimestampBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]

Bases: BaseRowCheckConfig

Declarative configuration model for TimestampBetweenCheck.

columns

The list of timestamp columns to validate.

Type:: List[str]

min_value

Minimum allowed timestamp.

Type:: str

max_value

Maximum allowed timestamp.

Type:: str

inclusive

Optional tuple of booleans for boundary inclusion.

Type:: tuple

check_class: alias of TimestampBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() → TimestampBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Raises:: InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class TimestampMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for TimestampMaxCheck.

columns

The timestamp columns to validate.

Type:: List[str]

max_value

The maximum allowed timestamp in ISO 8601 format.

Type:: str

inclusive

Whether to include the upper bound timestamp.

Type:: bool

check_class: alias of TimestampMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TimestampMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the TimestampMinCheck.

columns

The list of timestamp columns to validate.

Type:: List[str]

min_value

The minimum allowed timestamp in ISO 8601 format (e.g. ‘2023-01-01T00:00:00’).

Type:: str

inclusive

Whether the minimum value is inclusive.

Type:: bool

check_class: alias of TimestampMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the UniqueRatioCheck.

column

The column to check for uniqueness.

Type:: str

min_ratio

The minimum acceptable ratio of distinct values (0.0 - 1.0).

Type:: float

check_class: alias of UniqueRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRowsCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, subset_columns: List[str] | None = None)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for the UniqueRowsCheck.

This check verifies that no duplicate row combinations exist in the dataset. Uniqueness can be enforced across all columns or a selected subset.

subset_columns

List of columns to define uniqueness. If not provided, all columns are used.

Type:: Optional[List[str]]

check_class: alias of UniqueRowsCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].