Count Max¶

Check: row-count-max-check

Purpose: Validates that the dataset does not exceed a defined maximum number of rows. Use this to detect unexpected data growth, runaway joins, or accidental full loads when only incremental data is expected.

PythonYAML

from sparkdq.checks import RowCountMaxCheckConfig
from sparkdq.core import Severity

RowCountMaxCheckConfig(
    check_id="batch-size-upper-bound",
    max_count=100000,
    severity=Severity.ERROR
)

- check: row-count-max-check
  check-id: batch-size-upper-bound
  max-count: 100000
  severity: error

Typical Use Cases¶

Detect abnormal data growth that may indicate duplicates or incorrect joins.
Prevent downstream systems from processing unexpectedly large datasets.
Catch accidental full loads when only an incremental extract was intended.

← Aggregate Checks