Column Greater Than Check#
Check: column-greater-than-check
Purpose: Ensures that values in one column are strictly greater than (or greater than or equal to) the values in another column or the result of evaluating a Spark SQL expression.
Note
Rows with
nullvalues in either column or the limit result are treated as invalid and will fail the check.Use the
inclusiveflag to control whether equality (>=) is permitted.inclusive: false: requirescolumn > limitinclusive: true: allowscolumn >= limit
Python Configuration#
from sparkdq.checks import ColumnGreaterThanCheckConfig
from sparkdq.core import Severity
# Simple column comparison
ColumnGreaterThanCheckConfig(
check_id="dropoff-after-pickup",
column="dropoff_datetime",
limit="pickup_datetime",
inclusive=False, # use True to allow equality
severity=Severity.CRITICAL
)
# Expression with mathematical operation
ColumnGreaterThanCheckConfig(
check_id="price-with-margin",
column="selling_price",
limit="cost_price * 1.2",
inclusive=True, # use False to require strict greater than
severity=Severity.CRITICAL
)
# Complex conditional expression
ColumnGreaterThanCheckConfig(
check_id="score-validation",
column="user_score",
limit="CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END",
inclusive=False,
severity=Severity.CRITICAL
)
Declarative Configuration#
- check: column-greater-than-check
check-id: dropoff-after-pickup
column: dropoff_datetime
limit: pickup_datetime
inclusive: false
severity: critical
- check: column-greater-than-check
check-id: price-with-margin
column: selling_price
limit: cost_price * 1.2
inclusive: true
severity: critical
- check: column-greater-than-check
check-id: score-validation
column: user_score
limit: CASE WHEN level='beginner' THEN min_score ELSE min_score * 1.1 END
inclusive: false
severity: critical
Typical Use Cases#
✅ Enforce correct ordering of timestamps, such as ensuring dropoff time is after pickup time.
✅ Detect incorrect or corrupted ranges in financial or numeric fields.
✅ Validate business rules with dynamic limits, such as ensuring selling price exceeds minimum margins above cost price (e.g., selling_price > cost_price * 1.2).
✅ Apply conditional validation logic based on categories or statuses (e.g., score > CASE WHEN level=’beginner’ THEN min_score ELSE min_score * 1.1 END).
✅ Prevent invalid or logically inconsistent data entries in business-critical systems.