Count Min#

Check: row-count-min-check

Purpose: Verifies that the input DataFrame contains at least a specified minimum number of rows. Prevents downstream processing on incomplete or unexpectedly small datasets.

Python Configuration#

from sparkdq.checks import RowCountMinCheckConfig
from sparkdq.core import Severity

RowCountMinCheckConfig(
    check_id="minimum_required_records",
    min_count=10000,
    severity=Severity.WARNING
)

Declarative Configuration#

- check: row-count-min-check
  check-id: minimum_required_records
  min-count: 10000
  severity: warning

Typical Use Cases#

  • ✅ Validate that a data extraction or ingestion process has produced a sufficient number of records.

  • ✅ Detect partial loads or failed data transfers that result in missing data.

  • ✅ Enforce minimum data volume requirements for reliable analytics, training datasets, or reporting.

  • ✅ Prevent downstream processes from running on incomplete data.