Completeness Ratio#

Check: completeness-ratio-check

Purpose: Validates that the ratio of non-null values in a specified column meets or exceeds a defined threshold (min_ratio). This allows for soft validation of column completeness without enforcing strict non-null constraints.

Python Configuration#

from sparkdq.checks import CompletenessRatioCheckConfig
from sparkdq.core import Severity

CompletenessRatioCheckConfig(
    check_id="pickup-time-mostly-complete",
    column="tpep_pickup_datetime",
    min_ratio=0.95,
    severity=Severity.WARNING
)

Declarative Configuration#

- check: completeness-ratio-check
  check-id: pickup-time-mostly-complete
  column: tpep_pickup_datetime
  min-ratio: 0.95
  severity: warning

Typical Use Cases#

  • ✅ Detect columns with unexpectedly high proportions of missing values.

  • ✅ Enforce soft completeness thresholds on optional or partially-populated fields.

  • ✅ Ensure minimum data quality for downstream analytics or feature generation.

  • ✅ Provide early signals for upstream data loss or extraction failures.