Schema Check¶
Check: schema-check
Purpose: Validates that the DataFrame matches an expected schema in terms of column names and Spark data types. Optionally enforces strict matching to reject unexpected additional columns.
Supported Data Types¶
The following Spark types are supported and must be specified as lowercase strings:
string, boolean, int, bigint, float, double, date, timestamp, binary, array, map, struct, decimal(precision, scale) — e.g., decimal(10,2)
Important
For decimal types, both precision and scale must be specified inside parentheses. Formats such as integer or decimal(10.2) are not accepted.
Typical Use Cases¶
- Enforce schema contracts between ingestion, transformation, and consumption stages.
- Detect missing, renamed, or type-changed columns introduced by upstream schema evolution.
- Prevent silent data corruption caused by implicit type casting on incorrect column types.