Home / Architecture / Tool 11 Demo / Algorithm Detail

Tool 11 · Algorithm Deep Dive

Compliance Matcher

Semantic Similarity + Regulatory NLP

0.93NER F1
100%Traceability
4,000Regulatory Passages
6Compliance Domains
Try Interactive Demo

🎯 Why This Algorithm

📋 Problem Statement

Compliance requires traceability — auditors need to see exactly why a control was or wasn't applied. That rules out black-box LLMs as the decider. In pharma, missing a single GxP requirement triggers FDA Form 483. In banking, missing SOX means material weakness disclosure.

✅ Solution

Two-stage pipeline: BioBERT NER detects regulatory signals in requirement text (GxP, SOX, GDPR, HIPAA, FDA 21 CFR Part 11). Rete Rules Engine deterministically maps signals → control packs. Every activation logged with triggering span — 100% auditable.

✓ Compliance Domains Detected

GxP (GMP/GLP/GCP) SOX (302, 404) GDPR (Art 17, 32) HIPAA FDA 21 CFR Part 11 EU Data Residency

🧩 What It Comprises

🧬 BioBERT NER

Fine-tuned on 4,000 regulatory passages. Identifies entities: GxP, SOX, GDPR, HIPAA, 21 CFR Part 11, FDA, EU data residency.

📋 Rete Rules Engine

Forward-chaining rules map NER hits → control pack applicability. Every activation logged with triggering span.

📚 Control Pack Library

50+ pre-defined control packs across GxP, SOX, GDPR, HIPAA. Includes validation tasks and effort estimates.

🔍 Multi-Domain Ensemble

Router detects industry → routes to specialized models (BioBERT for pharma, FinBERT for banking, SecBERT for public sector).

📥 Inputs & 📤 Outputs

📥 Inputs

  • Requirement text (from Tool 01/02)
  • Industry context (Pharma, Banking, etc.)
  • Geography (EU, US, Global)

📤 Outputs

  • Applicable compliance packs
  • Per-control activation rationale (with source span)
  • Validation effort estimate
  • Audit trail (100% traceable)

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                        COMPLIANCE PACK MATCHER PIPELINE                                    │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   INPUT:     │  "System must maintain audit trail with electronic signature           │
│   │ Requirement  │   (21 CFR Part 11 compliant) and GDPR Article 32 encryption"           │
│   └──────┬───────┘                                                                        │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: INDUSTRY ROUTER                          │                    │
│   │                                                                     │                    │
│   │   Detect industry from context:                                     │                    │
│   │   • Pharma/Life Sciences → BioBERT                                  │                    │
│   │   • Banking/Finance → FinBERT                                       │                    │
│   │   • Public Sector → SecBERT                                         │                    │
│   │   • General → Legal-BERT                                            │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: NAMED ENTITY RECOGNITION (NER)           │                    │
│   │                                                                     │                    │
│   │   BioBERT (or domain-specific model) tags regulatory entities:      │                    │
│   │                                                                     │                    │
│   │   Input: "System must maintain audit trail with electronic          │                    │
│   │           signature (21 CFR Part 11 compliant) and GDPR Art 32"     │                    │
│   │                                                                     │                    │
│   │   Output:                                                           │                    │
│   │   ┌──────────────────────────────┬──────────────────────────────┐  │                    │
│   │   │ Entity                       │ Span                         │  │                    │
│   │   ├──────────────────────────────┼──────────────────────────────┤  │                    │
│   │   │ FDA_21_CFR_PART_11           │ "21 CFR Part 11 compliant"   │  │                    │
│   │   │ GDPR_ARTICLE_32              │ "GDPR Article 32 encryption" │  │                    │
│   │   │ AUDIT_TRAIL                  │ "maintain audit trail"       │  │                    │
│   │   └──────────────────────────────┴──────────────────────────────┘  │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: RETE RULES ENGINE                        │                    │
│   │                                                                     │                    │
│   │   Forward-chaining deterministic rules:                             │                    │
│   │                                                                     │                    │
│   │   Rule 1: IF FDA_21_CFR_PART_11 AND AUDIT_TRAIL                    │                    │
│   │           THEN ADD Control_Pack = "ERES-01: Electronic Records"     │                    │
│   │                                                                     │                    │
│   │   Rule 2: IF GDPR_ARTICLE_32                                        │                    │
│   │           THEN ADD Control_Pack = "GDPR-07: Encryption at Rest"     │                    │
│   │                                                                     │                    │
│   │   Rule 3: IF AUDIT_TRAIL AND NOT SOX                                │                    │
│   │           THEN ADD Control_Pack = "AUD-02: Audit Trail Baseline"    │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │                    │
│   │   │  🔍 Every activation logged with:                            │  │                    │
│   │   │  • Triggering requirement span                               │  │                    │
│   │   │  • Rule ID that fired                                        │  │                    │
│   │   │  • Timestamp                                                 │  │                    │
│   │   │  → 100% auditable trail                                      │  │                    │
│   │   └─────────────────────────────────────────────────────────────┘  │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: CONTROL PACK EXPANSION                   │                    │
│   │                                                                     │                    │
│   │   Each Control Pack expands to validation tasks:                    │                    │
│   │                                                                     │                    │
│   │   ERES-01 (Electronic Records):                                     │                    │
│   │   • IQ-01: System installation qualification                        │                    │
│   │   • OQ-03: Audit trail generation test                              │                    │
│   │   • PQ-02: Electronic signature verification                        │                    │
│   │   Estimated Effort: 45 hours                                        │                    │
│   │                                                                     │                    │
│   │   GDPR-07 (Encryption at Rest):                                     │                    │
│   │   • SEC-01: Encryption key management verification                  │                    │
│   │   • SEC-04: Data-at-rest encryption test                            │                    │
│   │   Estimated Effort: 22 hours                                        │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │   OUTPUT JSON:                                                    │                    │
│   │   {                                                               │                    │
│   │     "requirement": "...",                                         │                    │
│   │     "compliance_packs": ["ERES-01", "GDPR-07", "AUD-02"],         │                    │
│   │     "total_effort_hours": 82,                                     │                    │
│   │     "audit_trail": [                                              │                    │
│   │       {"pack": "ERES-01", "trigger": "21 CFR Part 11...",         │                    │
│   │        "rule": "FDA_21_CFR_PART_11_AND_AUDIT_TRAIL"}              │                    │
│   │     ]                                                             │                    │
│   │   }                                                               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                    

🔀 Multi-Domain Ensemble Architecture (Recommended)

Industry Router
BioBERT
Pharma
FinBERT
Banking
SecBERT
Public Sector
Legal-BERT
General
Ensemble Voting
Rete Rules Engine

📐 Mathematical Explanation

NER Token Classification (BioBERT):

P(tag | token) = softmax( W · h_token + b )

Where h_token is the final hidden state from BioBERT.

Training Loss (Cross-Entropy):

L = - Σ Σ y_ic log(p_ic)

Rete Algorithm Pattern Matching:

For each rule R: IF (condition₁ ∧ condition₂ ∧ ...) THEN action
When working memory matches all conditions, fire action.

Confidence Calibration (Platt Scaling for NER):

P_calibrated = 1 / (1 + e^{A·logit + B})

📊 Measured Performance

MetricValueBenchmark
NER F1 (GxP/SOX/GDPR)0.934,000 regulatory passages
Control Activation Traceability100%Audit requirement met
False Positive Rate4.2%Incorrectly activated controls
False Negative Rate2.8%Missed compliance requirements
Inference Latency85msNER + Rules

📚 Training & Calibration Set

  • NER Training: 4,000 regulatory text passages annotated by compliance practice
  • Domains: GxP (1,500), SOX (1,000), GDPR (800), HIPAA (400), FDA (300)
  • Base Model: BioBERT (PubMed + PMC pre-trained)
  • Fine-tuning: 5 epochs, batch size 16, learning rate 2e-5
  • Validation: Re-validated against FDA/EMA/SEC guidance quarterly
  • Rules: 87 deterministic rules maintained by compliance SMEs

🎬 End-to-End Example

Scenario: Pharma GxP Validation

  1. Input: "System shall maintain complete audit trail of all GxP-relevant changes with electronic signature (21 CFR Part 11)"
  2. Router: Pharma context → BioBERT
  3. NER: Detects "GxP", "audit trail", "electronic signature", "21 CFR Part 11"
  4. Rules: Activates ERES-01 (Electronic Records), AUD-02 (Audit Trail), GxP-03 (Change Control)
  5. Output: 3 control packs, 12 validation tasks, 67 estimated hours

Result: Compliance scope automatically identified; audit trail 100% traceable to source requirement.