Count Between#
Check: row-count-between-check
Purpose: Verifies that the number of rows in the dataset falls within a specified minimum and maximum range. Ensures data completeness while preventing unexpected data volume.
Python Configuration#
from sparkdq.checks import RowCountBetweenCheckConfig
from sparkdq.core import Severity
RowCountBetweenCheckConfig(
check_id="expected_daily_batch_size",
min_count=1000,
max_count=5000,
severity=Severity.ERROR
)
Declarative Configuration#
- check: row-count-between-check
check-id: expected_daily_batch_size
min-count: 1000
max-count: 5000
severity: error
Typical Use Cases#
✅ Detect partial loads (too few rows) or runaway joins / duplications (too many rows).
✅ Validate dataset size before triggering downstream jobs like training, reporting, or exports.
✅ Catch filter changes that unintentionally affect row count.