Aggregate Checks#
Validates that a set of columns are fully populated. If any nulls are detected in the specified columns, the entire DataFrame is marked as invalid. |
|
Verifies the existence of required columns in the DataFrame, independent of their data types. |
|
Validates that the ratio of non-null values in a column meets a minimum threshold, enabling soft completeness validation and early detection of partially missing data. |
|
Ensures that the DataFrame contains at least a defined minimum number of rows. |
|
Ensures that the DataFrame does not exceed a defined maximum number of rows. |
|
Ensures that the number of rows in the dataset falls within a defined inclusive range. |
|
Ensures that the dataset contains exactly the specified number of rows. |
|
Validates that the ratio of distinct non-null values in a column exceeds a defined threshold, helping to detect overly uniform or low-cardinality fields. |
|
Validates that the most recent timestamp in a given column is within a defined freshness window relative to the current system time, helping detect outdated or stale data. |
|
Ensures that a DataFrame matches an expected schema by verifying column names and data types, with optional strict enforcement against unexpected columns. |
|
Validates that a specified column maintains a minimum ratio of unique (non-null) values, helping to detect excessive duplication and assess data entropy or feature distinctiveness. |
|
Validates that all rows in a DataFrame are unique, either across all columns or a defined subset, helping to detect unintended duplication and enforce row-level uniqueness. |