Column Less Than Check#
Check: column-less-than-check
Purpose: Ensures that values in one column are strictly less than (or less than or equal to) the values in another column or the result of evaluating a Spark SQL expression.
Note
Rows with
nullvalues in either column or the limit result are treated as invalid and will fail the check.Use the
inclusiveflag to control whether equality (<=) is permitted.inclusive: false: requirescolumn < limitinclusive: true: allowscolumn <= limit
Python Configuration#
from sparkdq.checks import ColumnLessThanCheckConfig
from sparkdq.core import Severity
# Simple column comparison
ColumnLessThanCheckConfig(
check_id="pickup-before-dropoff",
column="pickup_datetime",
limit="dropoff_datetime",
inclusive=False, # use True to allow equality
severity=Severity.CRITICAL
)
# Expression with mathematical operation
ColumnLessThanCheckConfig(
check_id="price-with-margin",
column="cost_price",
limit="selling_price * 0.8",
inclusive=True, # use False to require strict less than
severity=Severity.CRITICAL
)
# Complex conditional expression
ColumnLessThanCheckConfig(
check_id="score-validation",
column="user_score",
limit="CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END",
inclusive=False,
severity=Severity.CRITICAL
)
Declarative Configuration#
- check: column-less-than-check
check-id: pickup-before-dropoff
column: pickup_datetime
limit: dropoff_datetime
inclusive: false
severity: critical
- check: column-less-than-check
check-id: price-with-margin
column: cost_price
limit: selling_price * 0.8
inclusive: true
severity: critical
- check: column-less-than-check
check-id: score-validation
column: user_score
limit: CASE WHEN level='expert' THEN max_score ELSE max_score * 0.9 END
inclusive: false
severity: critical
Typical Use Cases#
✅ Enforce correct ordering of timestamps, such as ensuring pickup time is before dropoff time.
✅ Detect incorrect or corrupted ranges in financial or numeric fields.
✅ Validate business rules with dynamic limits, such as ensuring selling price is within acceptable margins of cost price (e.g., selling_price < cost_price * 1.2).
✅ Apply conditional validation logic based on categories or statuses (e.g., score < CASE WHEN level=’expert’ THEN max_score ELSE max_score * 0.9 END).
✅ Prevent invalid or logically inconsistent data entries in business-critical systems.