sparkdq.checks

sparkdq.checks#

class ColumnLessThanCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, smaller_column: str, greater_column: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ColumnLessThanCheck.

This config defines a row-level comparison between two columns, ensuring that values in smaller_column are strictly less than (or less than or equal to, if inclusive=True) the values in greater_column.

Null values in either column are treated as invalid and will fail the check.

Example

ColumnLessThanCheckConfig(

check_id=”start-before-end”, smaller_column=”start_time”, greater_column=”end_time”, inclusive=True

)

smaller_column

Column expected to contain smaller (or equal) values.

Type:

str

greater_column

Column expected to contain greater values.

Type:

str

inclusive

If True, validates smaller_column <= greater_column. If False, requires strict inequality (<). Defaults to False.

Type:

bool

check_class

alias of ColumnLessThanCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnPresenceCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, required_columns: list[str])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ColumnPresenceCheck.

This config defines a set of required column names that must exist in the DataFrame.

required_columns

The list of required column names.

Type:

list[str]

check_class

alias of ColumnPresenceCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ColumnsAreCompleteCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ColumnsAreCompleteCheck.

This configuration defines a completeness requirement for multiple columns. The check fails if any of the specified columns contain null values.

columns

List of required columns that must be fully populated.

Type:

List[str]

check_class

alias of ColumnsAreCompleteCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class CompletenessRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for CompletenessRatioCheck.

column

Column name to assess.

Type:

str

min_ratio

Minimum allowed non-null ratio (between 0.0 and 1.0).

Type:

float

check_class

alias of CompletenessRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_threshold() CompletenessRatioCheckConfig[source]

Ensures the min_ratio is between 0 and 1 (inclusive).

class DateBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool] = (False, False))[source]

Bases: BaseRowCheckConfig

Declarative configuration model for DateBetweenCheck.

columns

Date columns to validate.

Type:

List[str]

min_value

Minimum allowed date in ‘YYYY-MM-DD’ format.

Type:

str

max_value

Maximum allowed date in ‘YYYY-MM-DD’ format.

Type:

str

inclusive

Inclusion flags for min and max boundaries.

Type:

tuple[bool, bool]

check_class

alias of DateBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() DateBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Returns:

The validated configuration object.

Return type:

DateBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class DateMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for DateMaxCheck.

columns

Date columns to validate.

Type:

List[str]

max_value

The maximum allowed date in ISO format.

Type:

str

inclusive

Whether to include the maximum date.

Type:

bool

check_class

alias of DateMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DateMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the DateMinCheck.

columns

The list of date columns to validate.

Type:

List[str]

min_value

The minimum allowed date (inclusive), in ‘YYYY-MM-DD’ format.

Type:

str

inclusive

Whether to include the minimum date.

Type:

bool

check_class

alias of DateMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class DistinctRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for DistinctRatioCheck.

column

The column to evaluate for distinctness.

Type:

str

min_ratio

Minimum required ratio of distinct values (between 0 and 1).

Type:

float

check_class

alias of DistinctRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class ExactlyOneNotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the ExactlyOneNotNullCheck.

columns

The names of the columns where exactly one must be non-null per row.

Type:

List[str]

check_class

alias of ExactlyOneNotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class FreshnessCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, interval: int, period: Literal['year', 'month', 'week', 'day', 'hour', 'minute', 'second'])[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the FreshnessCheck.

Ensures that the newest value in the specified timestamp column is recent enough relative to the current time.

column

Name of the timestamp column.

Type:

str

interval

Time window size (must be positive).

Type:

int

period

Unit of time (e.g., “days”, “hours”, “mins”).

Type:

str

check_class

alias of FreshnessCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class IsContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, allowed_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the IsContainedInCheck.

This config allows validation that specified columns contain only predefined values.

allowed_values

Mapping of column names to allowed values.

Type:

dict[str, list[object]]

check_class

alias of IsContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_allowed_values() IsContainedInCheckConfig[source]

Validate that allowed_values is not empty and properly formed.

Returns:

The validated configuration object.

Return type:

IsContainedInCheckConfig

Raises:

InvalidCheckConfigurationError – If allowed_values is invalid.

class IsNotContainedInCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, forbidden_values: dict[str, list[object]])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the IsNotContainedInCheck.

This config allows validation that specified columns do NOT contain forbidden values.

forbidden_values

Mapping of column names to forbidden values.

Type:

dict[str, list[object]]

check_class

alias of IsNotContainedInCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_forbidden_values() IsNotContainedInCheckConfig[source]

Validate that forbidden_values is not empty and properly formed.

Returns:

The validated configuration object.

Return type:

IsNotContainedInCheckConfig

Raises:

InvalidCheckConfigurationError – If forbidden_values is missing or invalid.

class NotNullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NotNullCheck.

columns

The names of the columns that should remain null.

Type:

List[str]

check_class

alias of NotNullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NullCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str])[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NullCheck.

columns

The names of the columns to check for null values. This is a required field and must match existing columns in the DataFrame.

Type:

List[str]

check_class

alias of NullCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], inclusive: Tuple[bool, bool] = (False, False), min_value: float | int | Decimal, max_value: float | int | Decimal)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericBetweenCheck.

This configuration defines both a lower and upper bound constraint on one or more numeric columns. It ensures that all specified columns contain only values between the configured min_value and max_value. Violations are flagged per row.

columns

The list of numeric columns to validate.

Type:

List[str]

min_value

The minimum allowed value (inclusive).

Type:

float | int | Decimal

max_value

The maximum allowed value (inclusive).

Type:

float | int | Decimal

inclusive

Inclusion flags for min and max boundaries.

Type:

tuple[bool, bool]

check_class

alias of NumericBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_between_values() NumericBetweenCheckConfig[source]

Validates that min_value and max_value are properly configured and that min_value is not greater than max_value.

Returns:

The validated configuration object.

Return type:

NumericBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – If min_value or max_value are not set or if min_value > max_value.

class NumericMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericMaxCheck.

columns

The list of numeric columns to validate.

Type:

List[str]

max_value

The maximum allowed value (inclusive).

Type:

float | int | Decimal

inclusive

Whether to include the maximum value.

Type:

bool

check_class

alias of NumericMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class NumericMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: float | int | Decimal, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the NumericMinCheck.

columns

The list of numeric columns to validate.

Type:

List[str]

min_value

The minimum allowed value (inclusive).

Type:

float

inclusive

Whether to include the minimum value.

Type:

bool

check_class

alias of NumericMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RegexMatchCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, pattern: str, ignore_case: bool = False, treat_null_as_failure: bool = False)[source]

Bases: BaseRowCheckConfig

Configuration for RegexMatchCheck.

Validates that a string column matches a given regex pattern.

column

Column to validate.

Type:

str

pattern

Regex pattern to use for matching.

Type:

str

ignore_case

If True, regex is case-insensitive (default: False).

Type:

bool

treat_null_as_failure

If True, null values are marked as failed (default: False).

Type:

bool

check_class

alias of RegexMatchCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class RowCountBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountBetweenCheck.

This config is used to define acceptable row count bounds in a data validation pipeline. It ensures that: - both min_count and max_count are provided, - and that min_count <= max_count.

It is typically used when defining checks via JSON, YAML, or dict-based configs.

min_count

Minimum number of rows expected in the dataset.

Type:

int

max_count

Maximum number of rows allowed in the dataset.

Type:

int

check_class

alias of RowCountBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() RowCountBetweenCheckConfig[source]

Validate the logical consistency of the configured bounds.

This method ensures that min_count is not greater than max_count. If violated, a configuration-level exception is raised immediately to prevent runtime failures.

Returns:

The validated configuration object.

Return type:

RowCountBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – If min_count > max_count or min_count < 0.

class RowCountExactCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountExactCheck.

This configuration defines an exact row count requirement for a dataset. It ensures that the expected_count parameter is provided and is non-negative.

expected_count

The exact number of rows expected in the dataset.

Type:

int

check_class

alias of RowCountExactCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_expected() RowCountExactCheckConfig[source]

Validate that the configured expected_count is greater than 0.

Returns:

The validated configuration object.

Return type:

RowCountExactCheckConfig

Raises:

InvalidCheckConfigurationError – If expected_count is negative.

class RowCountMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, max_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountMaxCheck.

This configuration defines a maximum row count requirement for a dataset. It ensures that the max_count parameter is provided and has a positive value.

max_count

Maximum number of rows allowed in the dataset.

Type:

int

check_class

alias of RowCountMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max() RowCountMaxCheckConfig[source]

Validate that the configured max_count is greater than 0.

Returns:

The validated configuration object.

Return type:

RowCountMaxCheckConfig

Raises:

InvalidCheckConfigurationError – If max_count is not greater than 0.

class RowCountMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, min_count: int)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the RowCountMinCheck.

This configuration defines a minimum row count requirement for a dataset. It ensures that the min_count parameter is provided and is non-negative.

min_count

Minimum number of rows expected in the dataset.

Type:

int

check_class

alias of RowCountMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min() RowCountMinCheckConfig[source]

Validate that the configured min_count is greater than 0.

Returns:

The validated configuration object.

Return type:

RowCountMinCheckConfig

Raises:

InvalidCheckConfigurationError – If min_count is negative.

class SchemaCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, expected_schema: dict[str, str], strict: bool = True)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the ExpectedSchemaCheck.

Ensures the DataFrame matches the expected schema, with optional strict mode. Validates all specified types, including support for decimal(p,s) types.

expected_schema

Required column names and Spark types.

Type:

dict[str, str]

strict

Whether to disallow unexpected columns.

Type:

bool

check_class

alias of SchemaCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_schema() SchemaCheckConfig[source]

Validates that expected_schema is not empty and all types are valid.

Raises:

InvalidCheckConfigurationError – If any type is invalid.

class StringLengthBetweenCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, max_length: int, inclusive: tuple[bool, bool] = (True, True))[source]

Bases: BaseRowCheckConfig

Configuration for StringLengthBetweenCheck.

Validates that string values in the given column fall between a minimum and maximum length.

column

The string column to validate.

Type:

str

min_length

Minimum valid length (must be > 0).

Type:

int

max_length

Maximum valid length (must be >= min_length).

Type:

int

inclusive

Tuple indicating inclusiveness of min and max bounds.

Type:

tuple[bool, bool]

check_class

alias of StringLengthBetweenCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_range() StringLengthBetweenCheckConfig[source]

Validates that the min/max configuration is logically sound.

Returns:

Validated instance.

Return type:

StringLengthBetweenCheckConfig

Raises:

InvalidCheckConfigurationError – If min_length > max_length or values are invalid.

class StringMaxLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, max_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration for StringMaxLengthCheck.

Ensures that string values do not exceed the specified maximum length.

column

Column to validate.

Type:

str

max_length

Maximum allowed length (must be > 0).

Type:

int

inclusive

If True, length must be <= max_length. If False, strictly < max_length.

Type:

bool

check_class

alias of StringMaxLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_max_length() StringMaxLengthCheckConfig[source]

Validate that max_length is greater than 0.

Returns:

The validated object.

Return type:

StringMaxLengthCheckConfig

Raises:

InvalidCheckConfigurationError – If max_length <= 0.

class StringMinLengthCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_length: int, inclusive: bool = True)[source]

Bases: BaseRowCheckConfig

Configuration for StringMinLengthCheck.

Ensures that all non-null values in the specified column have a minimum length.

column

Name of the string column to validate.

Type:

str

min_length

Minimum allowed string length (must be > 0).

Type:

int

inclusive

If True, allows equality (>=). If False, requires strictly greater length (>).

Type:

bool

check_class

alias of StringMinLengthCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

validate_min_length() StringMinLengthCheckConfig[source]

Validate that the configured min_length is greater than 0.

Returns:

The validated configuration object.

Return type:

StringMinLengthCheckConfig

Raises:

InvalidCheckConfigurationError – If min_length is not greater than 0.

class TimestampBetweenCheck(check_id: str, columns: List[str], min_value: str, max_value: str, inclusive: tuple[bool, bool], severity: Severity = Severity.CRITICAL)[source]

Bases: BaseBetweenCheck

Row-level data quality check that verifies timestamp values are within a defined range.

A row fails the check if any of the specified columns contain a timestamp value that is less than min_value or greater than max_value. Boundary inclusiveness is configurable.

class TimestampMaxCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], max_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for TimestampMaxCheck.

columns

The timestamp columns to validate.

Type:

List[str]

max_value

The maximum allowed timestamp in ISO 8601 format.

Type:

str

inclusive

Whether to include the upper bound timestamp.

Type:

bool

check_class

alias of TimestampMaxCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class TimestampMinCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, columns: List[str], min_value: str, inclusive: bool = False)[source]

Bases: BaseRowCheckConfig

Declarative configuration model for the TimestampMinCheck.

columns

The list of timestamp columns to validate.

Type:

List[str]

min_value

The minimum allowed timestamp in ISO 8601 format (e.g. ‘2023-01-01T00:00:00’).

Type:

str

inclusive

Whether the minimum value is inclusive.

Type:

bool

check_class

alias of TimestampMinCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRatioCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, column: str, min_ratio: float)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration model for the UniqueRatioCheck.

column

The column to check for uniqueness.

Type:

str

min_ratio

The minimum acceptable ratio of distinct values (0.0 - 1.0).

Type:

float

check_class

alias of UniqueRatioCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class UniqueRowsCheckConfig(*, check_id: str, severity: Severity = Severity.CRITICAL, subset_columns: List[str] | None = None)[source]

Bases: BaseAggregateCheckConfig

Declarative configuration for the UniqueRowsCheck.

This check verifies that no duplicate row combinations exist in the dataset. Uniqueness can be enforced across all columns or a selected subset.

subset_columns

List of columns to define uniqueness. If not provided, all columns are used.

Type:

Optional[List[str]]

check_class

alias of UniqueRowsCheck

model_config: ClassVar[ConfigDict] = {'populate_by_name': True, 'validate_by_alias': True, 'validate_by_name': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].