Back to packs

ICD-10 & Clinical Data Quality

HIPAAfree

Validate clinical coding, patient records, and healthcare data against ICD-10 standards and HIPAA requirements.

29 rules 939 downloads4.7 avg (92)
icd-10clinicalhealthcarehipaapatient-datamedical-codinghl7
4.7(92 ratings)

Sign in to rate this pack

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data
Or use the CLI
$ npx dqhub install healthcare-icd10 --format soda --table YOUR_TABLE

About this pack

Comprehensive data quality checks for healthcare organizations handling clinical data. Covers: - Patient identifier validation (format, completeness, uniqueness) - Date validations (no future dates, reasonable age ranges, date ordering) - Code format validation (adaptable for ICD-10, CPT, NPI patterns) - Referential integrity between patient records - Data freshness for clinical reporting - Completeness checks for required clinical fields Mapped to HIPAA Security Rule requirements for data integrity.

Sources & References

CMS — ICD-10-CM Official Guidelines for Coding and Reporting

Diagnosis codes reported on claims must conform to ICD-10-CM format per HIPAA transaction standards

HIPAA — 45 CFR 162.1002

ICD-10-CM is the mandated code set for diagnosis coding in HIPAA-covered transactions

CMS — ICD-10-PCS Official Guidelines for Coding and Reporting

Inpatient procedure codes must conform to the 7-character ICD-10-PCS format

CMS — NPPES National Provider Identifier Standard

All HIPAA-covered entities must use NPIs that conform to the 10-digit format starting with 1 or 2

HIPAA — 45 CFR 162.406

The NPI must be a 10-position all-numeric identification number with a check digit in the last position

CMS — 45 CFR 162.406 - Standard unique health identifier for health care providers

The check digit in position 10 must satisfy the Luhn algorithm when prefixed with the constant 80840

HIPAA — 45 CFR Part 162 Subpart D

NPI check digit validation is required per the National Provider Identifier standard

Date values in FHIR resources must match the format YYYY, YYYY-MM, or YYYY-MM-DD

ISO — ISO 8601:2004 Date and time format

FHIR date format is a constrained subset of ISO 8601 calendar dates

Patient.gender is bound to AdministrativeGender with required strength; only the four defined codes are permitted

The gender field uses a required binding to the AdministrativeGender value set

HIPAA — 45 CFR 164.514(b)

Requires that individually identifiable health information be de-identified or masked before use in non-clinical contexts

HIPAA — 45 CFR 164.502(d)

Covered entities must limit PHI to the minimum necessary for the intended purpose

HIPAA — 45 CFR 164.514(b)(2)

Safe Harbor method requires removal or generalization of 18 categories of identifiers to qualify as de-identified data

HIPAA — 45 CFR 164.514(b)(2)(i)

Specifies the 18 identifier categories that must be removed including names, geographic subdivisions smaller than state, dates more specific than year, and ages over 89

HIPAA — 45 CFR 164.312(b)

Requires implementation of hardware, software, and procedural mechanisms to record and examine activity in information systems that contain or use ePHI

HIPAA — 45 CFR 164.308(a)(1)(ii)(D)

Requires regular review of information system activity records such as audit logs, access reports, and security incident tracking reports

HIPAA — 45 CFR 164.530(j)

Requires covered entities to retain documentation of policies, procedures, and compliance actions for 6 years from date of creation or last effective date

HIPAA — 45 CFR 164.524(b)(1)

Covered entities must maintain designated record sets that include PHI used for decisions about individuals, subject to retention requirements

What's included

12format rules
6completeness rules
3range rules
3uniqueness rules
2referential integrity rules
2consistency rules
1freshness rules

Checks included (29)

ICD-10-CM Diagnosis Code Format(diagnosis_code)

Validates that values conform to the ICD-10-CM (Clinical Modification) diagnosis code format. The code begins with an alphabetic character (A-T, V-Z, excluding U), followed by two digits, then an optional decimal point with 1-4 additional alphanumeric characters for specificity.

ICD-10-PCS Procedure Code Format(procedure_code)

Validates that values conform to the ICD-10-PCS (Procedure Coding System) format. PCS codes are exactly 7 alphanumeric characters. The letters O and I are excluded to avoid confusion with digits 0 and 1.

Valid Email Format(email)

Validates that values conform to a simplified RFC 5322 email address format. Checks for a local part containing alphanumeric characters and common special characters, an @ symbol, and a domain with at least one dot-separated label.

National Provider Identifier (NPI) Format(npi)

Validates that values conform to the NPI (National Provider Identifier) basic format. An NPI is exactly 10 digits and must begin with 1 (individual providers) or 2 (organizational providers). This check validates structure only; use the npi-luhn-check rule to verify the check digit.

Valid International Phone Number (E.164)(phone)

Validates that values conform to the E.164 international phone number format. Requires a + prefix followed by the country code and subscriber number, with a total length between 8 and 15 digits. Optionally allows spaces, hyphens, or dots as visual separators.

NPI Luhn Check Digit Validation(npi)

Validates the NPI check digit using the Luhn algorithm as specified in 45 CFR 162.406. The algorithm prefixes '80840' to the first 9 digits of the NPI, then applies the Luhn mod-10 formula. The 10th digit of the NPI must equal the computed check digit. This ensures the NPI is not just well-formed but mathematically valid.

Valid US Phone Number Format(phone)

Validates that values conform to a US phone number format. Accepts 10-digit numbers in common formats: (XXX) XXX-XXXX, XXX-XXX-XXXX, XXX.XXX.XXXX, XXX XXX XXXX, XXXXXXXXXX, and optional +1 or 1 country code prefix.

FHIR Date Format (ISO 8601 Partial Dates)(effective_date)

Validates that values conform to the HL7 FHIR R4 date data type format. FHIR dates follow ISO 8601 and allow partial precision: YYYY (year only), YYYY-MM (year and month), or YYYY-MM-DD (full date). This flexibility supports clinical scenarios where only partial date information is known.

FHIR Administrative Gender Values(gender)

Validates that values are one of the four allowed codes in the HL7 FHIR AdministrativeGender value set: male, female, other, or unknown. This is a required binding in FHIR Patient and other resources. Values are case-sensitive lowercase per the FHIR code data type.

Valid Date String Format(event_date)

Validates that date string values match the expected format. Supports configurable formats including YYYY-MM-DD (ISO 8601), MM/DD/YYYY, DD/MM/YYYY, YYYY/MM/DD, and DD-Mon-YYYY. Validates month (01-12), day (01-31), and reasonable year ranges.

HIPAA PHI Field Masking Validation(ssn)

Validates that Protected Health Information (PHI) fields such as SSN, MRN, and DOB are properly masked or redacted in non-clinical systems. Values must match recognized masking patterns (e.g., XXX-XX-1234, ***-**-1234, or REDACTED). Unmasked PHI in downstream or analytical systems constitutes a HIPAA violation.

HIPAA Safe Harbor De-identification Validation(zip_code)

Validates that data has been properly de-identified per the HIPAA Safe Harbor method. Checks that the 18 HIPAA identifiers are removed or generalized: zip codes must be truncated to 3 digits (or masked if population <20,000), dates must be generalized to year only, and ages over 89 must be aggregated to a single category (90+). This rule helps ensure compliance with the Safe Harbor de-identification standard.

Column Not Null

Asserts that a specified column contains no null values. This is the most fundamental completeness check — every row must have a value present in the target column.

Column Completeness Threshold

Asserts that a column meets a minimum completeness threshold, measured as the percentage of non-null values. Useful when some nulls are acceptable but the overall population rate must stay above a defined level (e.g., 95%).

Required Fields for Status

Asserts that when a status column has a specific value (e.g., 'active'), a set of required fields must all be populated (non-null). Enforces lifecycle-based data completeness rules where later stages demand richer data.

String Not Empty

Asserts that a string column contains no empty strings. This is distinct from a null check — a value can be non-null but still empty ('') or whitespace-only. Catches cases where upstream systems insert blank strings instead of proper nulls.

Conditional Not Null

Asserts that a target column is not null whenever a condition column has a specific value. For example, 'shipping_date must not be null when order_status is shipped'. Enforces business rules where field population depends on another field's state.

HIPAA PHI Access Audit Trail Validation

Validates that access audit trail records exist for all PHI data access events. Every record of PHI access must include non-null values for accessed_by (who accessed the data), accessed_at (when it was accessed), and access_reason (justification for access). Missing audit trail fields indicate a gap in access logging that violates HIPAA audit control requirements.

Date Not In Future

Validates that a date or timestamp column contains no values in the future. Catches data entry errors, timezone issues, and ETL bugs that produce future-dated records for columns like birth_date, transaction_date, or created_at.

Age Range

Validates that age values fall within a reasonable human age range. Default range is 0-120, configurable for specific contexts such as working-age adults (18-65) or children (0-17).

HIPAA PHI Data Retention Limit Validation

Validates that PHI data records do not exceed the configured retention period. Records with a creation or effective date older than the allowed retention window should be flagged for review and disposal. HIPAA requires that covered entities retain certain documentation for at least 6 years, but organizational policies may impose shorter limits on active PHI storage.

Column Unique

Validates that all non-null values in a specified column are unique. Useful for natural keys, email addresses, identifiers, and any column where duplicates indicate a data quality issue.

Duplicate Detection

Detects and counts duplicate rows based on specified columns. Returns the number of duplicates found and identifies the offending rows. Supports threshold-based alerting for acceptable duplicate rates.

Primary Key Valid

Validates that a column qualifies as a valid primary key by ensuring all values are both unique and not null. Combines uniqueness and completeness checks into a single rule.

Foreign Key Valid

Validates that all non-null values in a foreign key column exist in the referenced parent table's primary key column. Detects orphaned references that break referential integrity.

Orphan Record Detection

Detects child records that have no corresponding parent record. Orphan records indicate broken referential integrity caused by parent deletions without cascading, failed ETL jobs, or race conditions. Unlike foreign-key-valid which checks FK values, this rule focuses on finding and quantifying orphaned child records for remediation.

Date Order Valid

Asserts that a start date column is always before or equal to an end date column for every row. Catches data entry errors, timezone conversion bugs, or ETL transformation issues that invert temporal ordering.

Enum Value Valid

Asserts that all values in a column belong to a predefined set of allowed values. Catches typos, unexpected category values, or upstream system changes that introduce new enum variants without coordination.

Table Freshness

Asserts that a table has been updated within the specified number of hours. Uses the table's metadata (last modified timestamp) or a designated timestamp column to verify data is fresh and pipelines are running on schedule.