Is Not Contained In#
Check: is-not-contained-in-check
Purpose: Verifies that the values in the specified columns are not contained within forbidden sets. Useful for blocking unwanted states, codes, or categories early in the pipeline.
Python Configuration#
from sparkdq.checks import IsNotContainedInCheckConfig
from sparkdq.core import Severity
IsNotContainedInCheckConfig(
check_id="block_test_status_and_invalid_countries",
forbidden_values={
"status": ["TEST", "DUMMY"],
"country": ["XX"]
},
severity=Severity.CRITICAL
)
Declarative Configuration#
- check: is-not-contained-in-check
check-id: block_test_status_and_invalid_countries
forbidden-values:
status:
- TEST
- DUMMY
country:
- XX
severity: critical
Typical Use Cases#
✅ Block processing of test or dummy records in production datasets.
✅ Enforce that no invalid country codes (like
XX
orZZ
) are present.✅ Detect entries with forbidden status values (e.g.,
DELETED
,CANCELLED
).✅ Clean datasets by preventing known bad categories or invalid mappings.
Supported Data Types#
This check supports any data type that can be compared using equality (e.g., strings, integers, dates).