Regex Match Check#

Check: regex-match-check

Purpose: Validates that string values in a column match a given regular expression pattern. This is useful for checking format compliance, such as email addresses, identifiers, or codes.

Note

  • By default, null values are skipped. Set treat_null_as_failure: true to treat them as invalid.

  • Regex is case-sensitive by default. Use ignore_case: true for case-insensitive matching.

Python Configuration#

from sparkdq.checks import RegexMatchCheckConfig
from sparkdq.core import Severity

RegexMatchCheckConfig(
    check_id="valid-email",
    column="email",
    pattern=r"^[\w\.-]+@[\w\.-]+\.\w+$",
    ignore_case=True,
    treat_null_as_failure=False,
    severity=Severity.CRITICAL
)

Declarative Configuration#

- check: regex-match-check
  check-id: valid-email
  column: email
  pattern: "^[\\w\\.-]+@[\\w\\.-]+\\.\\w+$"
  ignore-case: true
  treat-null-as-failure: false
  severity: critical

Typical Use Cases#

  • ✅ Validate email address formatting

  • ✅ Check code or ID patterns (e.g. AB-12345)

  • ✅ Detect unexpected value structures