Freshness Check#
Check: freshness-check
Purpose: Validates that the most recent timestamp in a given column is within a defined freshness threshold relative to the current system time. The check fails if the newest value is older than the specified interval (e.g., more than 24 hours ago).
Note
The following time units are supported for the period parameter:
year
month
week
day
hour
minute
second
Python Configuration#
from sparkdq.checks import FreshnessCheckConfig
from sparkdq.core import Severity
FreshnessCheckConfig(
check_id="data-update-recent",
column="last_updated",
interval=24,
period="hour",
severity=Severity.CRITICAL
)
Declarative Configuration#
- check: freshness-check
check-id: data-update-recent
column: last_updated
interval: 24
period: hour
severity: critical
Typical Use Cases#
✅ Ensure that ingested data was recently updated (e.g., within the last hour).
✅ Detect delays or failures in upstream data ingestion pipelines.
✅ Validate that partitioned tables or snapshots are being refreshed regularly.
✅ Use to monitor SLAs and enforce data freshness for reporting or analytics.