sparkdq.management#
- class CheckSet[source]
Bases:
object
Centralized registry and lifecycle manager for data quality checks.
Orchestrates the complete lifecycle of data quality checks from configuration to execution readiness, providing a unified interface for check registration, organization, and retrieval. The CheckSet abstracts the complexity of check instantiation and type management, enabling clean separation between check definition and execution logic.
This design supports both programmatic and declarative check registration patterns while maintaining type safety and providing convenient filtering capabilities for different execution contexts.
- add_check(config: BaseCheckConfig) CheckSet [source]
Register a single check from a validated configuration object.
Instantiates the check from the provided configuration and adds it to the internal registry. The fluent interface enables method chaining for convenient multi-check registration.
- Parameters:
config (BaseCheckConfig) – Validated configuration object containing all parameters required for check instantiation.
- Returns:
This instance to enable fluent method chaining.
- Return type:
CheckSet
- add_checks_from_dicts(configs: List[Dict[str, Any]]) None [source]
Register multiple checks from raw configuration dictionaries.
Processes a collection of configuration dictionaries through the CheckFactory, performing validation and instantiation for each check definition. This method enables bulk registration from external configuration sources such as JSON, YAML, or database records.
- Parameters:
configs (List[Dict[str, Any]]) – Collection of configuration dictionaries, each containing the parameters required for a specific check type.
- clear() None [source]
Remove all registered checks from this CheckSet.
Clears the internal check registry, returning the CheckSet to its initial empty state. This operation is useful for resetting between validation runs or when reconfiguring the check collection.
- get_aggregate_checks() List[BaseAggregateCheck] [source]
Retrieve only the dataset-level validation checks.
Filters the registered checks to return only those that evaluate global dataset properties, enabling targeted execution for aggregate-level validation scenarios.
- Returns:
- Collection of checks that validate
dataset-wide properties and constraints.
- Return type:
List[BaseAggregateCheck]
- get_all() List[BaseCheck] [source]
Retrieve the complete collection of registered checks.
- Returns:
- All checks currently managed by this CheckSet,
regardless of their specific type or implementation.
- Return type:
List[BaseCheck]
- get_row_checks() List[BaseRowCheck] [source]
Retrieve only the record-level validation checks.
Filters the registered checks to return only those that operate on individual records, enabling targeted execution for row-level validation scenarios.
- Returns:
- Collection of checks that validate individual
records within the dataset.
- Return type:
List[BaseRowCheck]