sparkdq.management#
- class CheckSet[source]
Bases:
object
Manages a collection of data quality checks.
The CheckSet handles the lifecycle of data quality checks within the framework. It converts configurations into concrete check instances, keeps track of all registered checks, and provides filtered access to row-level or aggregate-level checks.
This component decouples check definition from check execution.
- add_check(config: BaseCheckConfig) CheckSet [source]
Adds a single check from a validated configuration object and returns self for fluent chaining.
- Parameters:
config (BaseCheckConfig) – The configuration object defining the check.
- Returns:
The current instance with the added check.
- Return type:
CheckSet
- add_checks_from_dicts(configs: List[Dict[str, Any]]) None [source]
Adds multiple checks from raw configuration dictionaries using the CheckFactory.
Note
The sparkdq.checks module is imported here to ensure that all available checks are registered in the CheckFactory before instantiation.
- Parameters:
configs (List[Dict[str, Any]]) – A list of configuration dictionaries defining the checks.
- clear() None [source]
Removes all currently registered checks.
Useful for resetting the CheckSet between validation runs.
- get_aggregate_checks() List[BaseAggregateCheck] [source]
Returns only the aggregate-level checks.
- Returns:
Checks that operate on DataFrame aggregates.
- Return type:
List[BaseAggregateCheck]
- get_all() List[BaseCheck] [source]
Returns all registered checks.
- Returns:
All checks currently managed by this CheckSet.
- Return type:
List[BaseCheck]
- get_row_checks() List[BaseRowCheck] [source]
Returns only the row-level checks.
- Returns:
Checks that operate on individual rows.
- Return type:
List[BaseRowCheck]