EU AI Act Data Governance (Art. 10)

EU_AI_ACTfree

Validate AI training data quality — label completeness, class balance, train/test leakage, feature null rates, data drift, bias coverage, and provenance tracking per EU AI Act Article 10.

10 rules 2619 downloads4.6 avg (140)

ai-actmltraining-databiasdata-governancedriftprovenanceeu-regulation

4.6(140 ratings)

Test this pack with your data

Download the template, fill in your data, and see quality results instantly.

Test This Pack

Download & Install

Choose your tool — get a ready-to-run file

Run this on your data? Upload your CSV — we'll auto-map the columns, validate, and report the bad rows.Test my data

Or use the CLI

$ npx dqhub install eu-ai-act-data-governance --format soda --table YOUR_TABLE

About this pack

Data quality checks for EU AI Act Article 10 compliance — training, validation, and test dataset governance. Covers: - Label completeness for supervised learning - Class imbalance detection (>10:1 ratio) - Train/test data leakage prevention - Feature null rate thresholds - Data drift detection vs. training baseline - Outlier rate monitoring - Dataset documentation completeness - Protected attribute coverage (bias detection) - Data provenance and versioning - PII annotation and anonymization tracking Standards: EU AI Act (Reg. 2024/1689) Art. 10, ISO/IEC 5259, NIST AI RMF Enforced: August 2026

Sources & References

EU AI Act — Article 10 — Data and Data Governance

Training, validation, and testing datasets shall be subject to appropriate data governance and management practices, including examination for completeness

ISO/IEC 5259 — ISO/IEC 5259-1:2024 — Data Quality for AI

Data quality measures for AI training data including completeness requirements

NIST AI RMF — MAP 2.3 / MEASURE 2.6

Ensure data quality and completeness for AI system training datasets

NIST AI RMF — MAP 2.3 / MEASURE 2.3

Assess data representativeness and identify potential sources of bias in training datasets

NIST AI RMF — MEASURE 2.6

Ensure evaluation datasets are independent from training data to produce reliable performance metrics

NIST AI RMF — MEASURE 2.6 / MANAGE 4.2

Monitor AI system inputs for data drift and distribution changes that may affect performance

NIST AI RMF — MAP 2.3 / GOVERN 1.5

Maintain documentation of training data sources, characteristics, and intended use

NIST AI RMF — MAP 2.3 / MEASURE 2.11

Evaluate AI datasets for bias and ensure adequate representation of relevant demographic groups

NIST AI RMF — GOVERN 1.5 / MAP 2.3

Maintain data provenance records to support AI system accountability and auditability

NIST AI RMF — GOVERN 1.5 / MAP 5.1

Identify and manage personally identifiable information in AI system datasets with appropriate safeguards

What's included

5completeness rules

4statistical rules

1uniqueness rules

Checks included (10)

Training Data Label Completeness

Validates that training data labels are non-null for all supervised learning records. Under EU AI Act Article 10, high-risk AI systems must be developed with training data that meets quality criteria including completeness. Missing labels in supervised learning datasets compromise model reliability and violate data governance requirements.

Feature Column Null Rate Threshold

Validates that feature columns do not exceed the configured null rate threshold. Excessive missing values in feature columns degrade model training quality and can introduce bias. Under EU AI Act Article 10, training data must be complete in view of the intended purpose of the AI system.

Dataset Documentation Completeness

Validates that each dataset has required documentation fields populated: description, source, collection_date, size, and intended_use. Under EU AI Act Article 10, providers of high-risk AI systems must maintain comprehensive documentation of training data including its characteristics, properties, and intended purpose.

Data Provenance Tracking Completeness

Validates that each record has provenance fields populated: source_system, ingestion_date, and data_version. Under EU AI Act Article 10, providers must maintain data governance practices that ensure traceability of training data origin and lineage. Provenance tracking is essential for auditing, debugging model behavior, and demonstrating regulatory compliance.

PII Annotation and Anonymization Flag

Validates that records containing personal data have pii_flag set to true and anonymization_method populated. Under EU AI Act Article 10, training data containing personal data requires appropriate data governance measures including privacy-preserving techniques. Proper PII annotation ensures transparency and supports GDPR compliance alongside AI Act requirements.

Training Data Class Balance Metric

Validates that the class distribution in training data does not have an imbalance ratio exceeding the configured threshold between any two classes. Under EU AI Act Article 10, training datasets must be representative and free from bias. Severe class imbalance can lead to biased model predictions and underperformance on minority classes.

Feature Distribution Data Drift Detection

Validates that feature distributions in production data do not deviate more than the configured number of standard deviations from the training baseline. Data drift indicates that production inputs have shifted from what the model was trained on, potentially degrading AI system performance. Under EU AI Act Article 10, ongoing data governance requires monitoring for dataset relevance.

Outlier Rate Threshold Check

Validates that the rate of statistical outliers (values exceeding 3 standard deviations from the mean) in a feature column stays below the configured threshold. Excessive outliers in training data can skew model learning and produce unreliable AI systems. Under EU AI Act Article 10, training data must be free of errors to the best extent possible.

Protected Attribute Representation Coverage

Validates that protected attributes (such as gender, age_group, ethnicity) are represented with a minimum coverage percentage per group. Under EU AI Act Article 10, training data must be examined for possible biases that are likely to affect the health, safety, or fundamental rights of persons. Insufficient representation of protected groups can lead to discriminatory AI outcomes.

Train-Test Data Leakage Detection

Validates that no record IDs appear in both the training and test datasets. Data leakage between training and test sets leads to artificially inflated model performance metrics and unreliable AI systems. Under EU AI Act Article 10, datasets must support proper evaluation of AI system performance.