ETL Pipeline Monitoring
freeMonitor data pipeline health — row counts, freshness, schema changes, volume anomalies, and ingestion delays.
Checks included (18)
Table Not Empty
Asserts that a table contains at least one row. This is the most fundamental volume check to confirm that a table has not been accidentally truncated, dropped, or failed to load any data.
Row Count Minimum
Asserts that a table contains at least the specified minimum number of rows. Useful for tables with known baseline volumes where dropping below a threshold indicates a data load issue.
Row Count Range
Asserts that the row count of a table falls within an expected minimum and maximum range. Catches both data loss (too few rows) and data duplication or explosion (too many rows).
Row Count Growth
Asserts that the current row count has not decreased more than the specified percentage compared to the previous run's row count. Detects accidental data loss, failed incremental loads, or unintended deletions between pipeline runs.
Row Count Anomaly Detection
Asserts that the current row count is within the specified number of standard deviations from the historical average. Uses statistical anomaly detection to catch unexpected volume spikes or drops without requiring hard-coded thresholds.
Daily Volume Consistency
Asserts that daily row counts fall within an expected range. Identifies days with abnormally low or high data volumes that may indicate partial loads, duplicate ingestion, or upstream source issues.
Schema Column Count
Asserts that a table has the expected number of columns. Detects unintended schema changes such as dropped columns, added columns from upstream migrations, or schema drift between environments.
Table Freshness
Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.
Column Max Age
Asserts that the most recent value in a date/timestamp column is within the specified number of hours from the current time. Useful for verifying that new data is arriving as expected in date-partitioned or event-driven tables.
Partition Freshness
Asserts that the latest partition or date value in a partitioned table is within the expected range of the current date. Ensures that daily, hourly, or other periodic data loads are completing on schedule.
Ingestion Delay
Asserts that the time difference between the source event timestamp and the load/ingestion timestamp is within the defined SLA. Detects pipeline lag, backpressure, or ingestion failures that cause data to arrive late.
Null Rate Stable
Asserts that the null rate of a column has not changed more than the specified percentage points from a known baseline. Detects regressions in data completeness that may indicate broken upstream transformations, schema changes, or ETL failures.
Cardinality Check
Asserts that the number of distinct values in a column falls within an expected range. Detects issues such as collapsed categories (too few distinct values), data explosion (too many), or enum drift from upstream changes.
Mean In Range
Asserts that the arithmetic mean of a numeric column falls within an expected range. Detects data drift, calculation errors, or upstream changes that shift the central tendency of key metrics.
Standard Deviation Stable
Asserts that the standard deviation of a numeric column has not changed more than the specified percentage from a known baseline. Detects changes in data variability that may indicate corrupted data, changed source systems, or process failures.
Required Columns Present
Asserts that a table contains all expected columns by name. Catches schema drift, missing columns after migrations, or upstream schema changes before downstream logic breaks.
Column Not Null
Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.
Column Completeness Threshold
Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).