Requirements Extraction — Algorithm Deep Dive

Detail

🎯 Why This Algorithm

📋 Problem Statement

A requirement line is often two things at once — a functional ask and a compliance constraint. "Process 10,000 invoices per hour with SOX audit trail" is FUNC + NFR + COMP simultaneously. Standard multi-class classifiers force a single label, causing critical requirements to be miscategorized. Missing a COMP tag means $45k+ in validation work unaccounted for.

✅ Solution

DeBERTa-v3 with multi-label classification head (6 sigmoid outputs). Disentangled Attention separates content from position, maintaining context across 50+ word SAP clauses. Multi-label architecture allows simultaneous tagging of FUNC, NFR, INT, DATA, COMP, and UX. Focal loss handles class imbalance.

Detail

🧩 What It Comprises

🤖 Core Model

DeBERTa-v3-base — 184M parameters, 12 transformer layers, disentangled attention mechanism.

🏷️ Classification Head

6-label classification head: FUNC NFR INT DATA COMP UX

📉 Loss Function

Focal Loss with γ=2.0 — down-weights easy examples, focuses on hard-to-classify edge cases.

⚡ Distilled Version

66M-parameter student model for sub-80ms inference in production.

Detail

📥 Inputs & 📤 Outputs

📥 Inputs

•Free-text requirement lines
•Up to 512 tokens per line
•Mixed tone (bullet, prose, contractual)
•From Tool 01 structured output

📤 Outputs

•Per-label probability (6 labels)
•Priority tag (Must/Should/Could)
•Calibrated confidence score
•Threshold-based human review flag

Detail

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                         REQUIREMENTS EXTRACTION PIPELINE                                   │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   INPUT:     │  "Process 10k invoices/hour with SOX audit trail and GDPR compliance" │
│   │ Requirement  │                                                                        │
│   │    Text      │                                                                        │
│   └──────┬───────┘                                                                        │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: TOKENIZATION                            │                    │
│   │   SentencePiece BPE (128K vocab) → 512 token limit                 │                    │
│   │   Tokens: [CLS] Process 10k invoices / hour with SOX ... [SEP]    │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: DeBERTa-v3 ENCODING                      │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐ │                    │
│   │   │              12 Transformer Layers                           │ │                    │
│   │   │                                                              │ │                    │
│   │   │   Disentangled Attention Formula:                            │ │                    │
│   │   │   A_ij = Content-to-Content(H_i, H_j)                        │ │                    │
│   │   │        + Content-to-Position(H_i, P_j|i)                     │ │                    │
│   │   │        + Position-to-Content(P_i|j, H_j)                     │ │                    │
│   │   │                                                              │ │                    │
│   │   │   → Separates WHAT a word is from WHERE it appears            │ │                    │
│   │   │   → Maintains context across 18-word gaps                     │ │                    │
│   │   └─────────────────────────────────────────────────────────────┘ │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: MULTI-LABEL CLASSIFICATION               │                    │
│   │                                                                     │                    │
│   │   Pool [CLS] token → 6 Sigmoid Heads (independent probabilities)   │                    │
│   │                                                                     │                    │
│   │   ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐            │                    │
│   │   │  FUNC   │ │  NFR    │ │  INT    │ │  DATA   │ │  COMP   │ │   UX    │            │                    │
│   │   │ σ=0.96  │ │ σ=0.89  │ │ σ=0.08  │ │ σ=0.12  │ │ σ=0.94  │ │ σ=0.03  │            │                    │
│   │   │   ✓     │ │   ✓     │ │         │ │         │ │   ✓     │ │         │            │                    │
│   │   └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘            │                    │
│   │                                                                     │                    │
│   │   Binary Cross-Entropy Loss with Sigmoid:                           │                    │
│   │   L = -1/N Σ [y_i·log(σ(x_i)) + (1-y_i)·log(1-σ(x_i))]             │                    │
│   │                                                                     │                    │
│   │   Focal Loss variant (γ=2.0) for class imbalance                    │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: CALIBRATION & PRIORITY                   │                    │
│   │   • Platt Scaling → Calibrated probabilities                        │                    │
│   │   • Priority Classifier: Must (P>0.8) / Should (0.5-0.8) / Could (<0.5)               │                    │
│   │   • Confidence < 0.75 → Flag for Human Review                       │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │   OUTPUT JSON:                                                    │                    │
│   │   {                                                               │                    │
│   │     "requirement": "Process 10k invoices/hour with SOX...",       │                    │
│   │     "labels": ["FUNC", "NFR", "COMP"],                            │                    │
│   │     "probabilities": {"FUNC":0.96, "NFR":0.89, "COMP":0.94},      │                    │
│   │     "priority": "Must",                                           │                    │
│   │     "confidence": 0.93,                                           │                    │
│   │     "human_review": false                                         │                    │
│   │   }                                                               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Detail

🏗️ Architecture & Integration

Where Requirements Extraction Sits in A²AI

📄 TOOL 01
Structured JSON

→

🏷️ TOOL 02
Requirements Extraction
DeBERTa-v3

→

📊 Tagged Requirements
6 Labels + Priority

↓ Feeds Into ↓

TOOL 03
Module Router

TOOL 04
Cost Forecaster

TOOL 05
Risk Estimator

TOOL 07
Fiori Recommender

TOOL 11
Compliance Matcher

Tool 02 is the CATEGORIZATION ENGINE — every downstream tool depends on accurate requirement tagging.

Detail

📐 Mathematical Explanation

Disentangled Attention (DeBERTa's Key Innovation):

A_ij = H_i^c · (H_j^c)ᵀ + H_i^c · (P_j|i)ᵀ + P_i|j · (H_j^c)ᵀ

Where:
• H_i^c = Content embedding of token i (WHAT the word is)
• P_j|i = Relative position embedding between token i and j (WHERE words are)

Multi-Label Classification Head:

For each label k ∈ {FUNC, NFR, INT, DATA, COMP, UX}:
P(y_k=1 | x) = σ( W_k · h_CLS + b_k ) = 1 / (1 + e^{-(W_k·h_CLS + b_k)})

Training Loss (Focal Loss for Class Imbalance):

FL(p_t) = -α_t (1 - p_t)^γ log(p_t)

Where:
• p_t = p if y=1, else 1-p
• γ = 2.0 (focusing parameter)
• α_t = class-specific weight

Platt Scaling Calibration:

P_calibrated(y=1 | f(x)) = 1 / (1 + e^{A·f(x) + B})

Where A, B are learned on a held-out validation set.

Detail

📊 Measured Performance

Metric	Value	Benchmark
Micro F1 (Overall)	91.8%	4,400 human-labeled requirements
Macro F1 (per-class average)	87.2%	6-class balanced test set
FUNC Detection	94.1% F1	Most frequent class
COMP Detection	89.3% F1	Critical for compliance
Error Rate (w/ human review)	3.1%	Threshold = 0.75
Inference Latency (student)	78ms	p99 on CPU

Detail

📚 Training & Calibration Set

•Size: 12,000 labeled requirement lines
•Source: 42 delivered SAP projects across FI, CO, MM, SD, PP, HCM
•Annotation: Triple-annotated by senior SAP consultants; Fleiss' κ = 0.84
•Class Distribution: FUNC (78%), NFR (42%), INT (23%), DATA (18%), COMP (31%), UX (12%)
•Calibration: 3-fold cross-validation Platt scaling, refit weekly

Detail

🎬 End-to-End Example

Scenario: Pharma RFP Requirement Processing

•Input: "The system shall maintain a complete audit trail of all GxP-relevant changes with electronic signature (21 CFR Part 11 compliant)"
•Tokenization: 47 tokens → SentencePiece BPE
•Encoding: DeBERTa processes through 12 layers; disentangled attention connects "system" to "shall maintain" across 15-word gap
•Classification: Sigmoid heads output: FUNC=0.92, NFR=0.23, INT=0.05, DATA=0.88, COMP=0.97, UX=0.02
•Thresholding: Labels above 0.5 threshold: FUNC, DATA, COMP
•Calibration: Platt scaling adjusts COMP from 0.97 → 0.94 calibrated confidence
•Priority: COMP=0.94 → "Must" priority

Result: Requirement correctly tagged as Functional, Data, and Compliance. Fed to Tool 11 for GxP control pack activation and Tool 04 for compliance effort estimation.