ICD-10 & Clinical Data Quality
HIPAAfreeValidate clinical coding, patient records, and healthcare data against ICD-10 standards and HIPAA requirements.
Checks included (29)
ICD-10-CM Diagnosis Code Format(diagnosis_code)
Validates that values conform to the ICD-10-CM (Clinical Modification) diagnosis code format. The code begins with an alphabetic character (A-T, V-Z, excluding U), followed by two digits, then an optional decimal point with 1-4 additional alphanumeric characters for specificity.
ICD-10-PCS Procedure Code Format(procedure_code)
Validates that values conform to the ICD-10-PCS (Procedure Coding System) format. PCS codes are exactly 7 alphanumeric characters. The letters O and I are excluded to avoid confusion with digits 0 and 1.
Valid Email Format(email)
Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.
National Provider Identifier (NPI) Format(npi)
Validates that values conform to the NPI (National Provider Identifier) basic format. An NPI is exactly 10 digits and must begin with 1 (individual providers) or 2 (organizational providers). This check validates structure only; use the npi-luhn-check rule to verify the check digit.
Valid International Phone Number (E.164)(phone)
Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.
NPI Luhn Check Digit Validation(npi)
Validates the NPI check digit using the Luhn algorithm as specified in 45 CFR 162.406. The algorithm prefixes '80840' to the first 9 digits of the NPI, then applies the Luhn mod-10 formula. The 10th digit of the NPI must equal the computed check digit. This ensures the NPI is not just well-formed but mathematically valid.
Valid US Phone Number Format(phone)
Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.
FHIR Date Format (ISO 8601 Partial Dates)(effective_date)
Validates that values conform to the HL7 FHIR R4 date data type format. FHIR dates follow ISO 8601 and allow partial precision: YYYY (year only), YYYY-MM (year and month), or YYYY-MM-DD (full date). This flexibility supports clinical scenarios where only partial date information is known.
FHIR Administrative Gender Values(gender)
Validates that values are one of the four allowed codes in the HL7 FHIR AdministrativeGender value set: male, female, other, or unknown. This is a required binding in FHIR Patient and other resources. Values are case-sensitive lowercase per the FHIR code data type.
Valid Date String Format(event_date)
Validates that date string values match the expected format. Supports configurable formats including YYYY-MM-DD (ISO 8601), MM/DD/YYYY, DD/MM/YYYY, YYYY/MM/DD, and DD-Mon-YYYY. Validates month (01-12), day (01-31), and reasonable year ranges.
HIPAA PHI Field Masking Validation(ssn)
Validates that Protected Health Information (PHI) fields such as SSN, MRN, and DOB are properly masked or redacted in non-clinical systems. Values must match recognized masking patterns (e.g., XXX-XX-1234, ***-**-1234, or REDACTED). Unmasked PHI in downstream or analytical systems constitutes a HIPAA violation.
HIPAA Safe Harbor De-identification Validation(zip_code)
Validates that data has been properly de-identified per the HIPAA Safe Harbor method. Checks that the 18 HIPAA identifiers are removed or generalized: zip codes must be truncated to 3 digits (or masked if population <20,000), dates must be generalized to year only, and ages over 89 must be aggregated to a single category (90+). This rule helps ensure compliance with the Safe Harbor de-identification standard.
Column Not Null
Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.
Column Completeness Threshold
Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).
Required Fields for Status
Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.
String Not Empty
Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.
Conditional Not Null
Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.
HIPAA PHI Access Audit Trail Validation
Validates that access audit trail records exist for all PHI data access events. Every record of PHI access must include non-null values for accessed_by (who accessed the data), accessed_at (when it was accessed), and access_reason (justification for access). Missing audit trail fields indicate a gap in access logging that violates HIPAA audit control requirements.
Date Not In Future
Validates that a date or timestamp column contains no values in the future. Catches data entry errors, timezone issues, and ETL bugs that produce future-dated records for columns like birth_date, transaction_date, or created_at.
Age Range
Validates that age values fall within a reasonable human age range. Default range is 0-120, configurable for specific contexts such as working-age adults (18-65) or children (0-17).
HIPAA PHI Data Retention Limit Validation
Validates that PHI data records do not exceed the configured retention period. Records with a creation or effective date older than the allowed retention window should be flagged for review and disposal. HIPAA requires that covered entities retain certain documentation for at least 6 years, but organizational policies may impose shorter limits on active PHI storage.
Column Unique
Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.
Duplicate Detection
Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.
Primary Key Valid
Validates that a column qualifies as a valid primary key by ensuring all values are both unique and not null. Combines uniqueness and completeness checks into a single rule.
Foreign Key Valid
Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.
Orphan Record Detection
Detects child records that have no corresponding parent record. Orphan records indicate broken referential integrity caused by parent deletions without cascading, failed ETL jobs, or race conditions. Unlike foreign-key-valid which checks FK values, this rule focuses on finding and quantifying orphaned child records for remediation.
Date Order Valid
Asserts that a start date column is always before or equal to an end date column for every row. Catches data entry errors, timezone conversion bugs, or ETL transformation issues that invert temporal ordering.
Enum Value Valid
Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.
Table Freshness
Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.