SAP Module Router — Algorithm Deep Dive

Detail

🎯 Why This Algorithm

📋 Problem Statement

SAP's module taxonomy evolves with every project. "Transfer Pricing" sits halfway between FI and CO. "Vendor Master" = "Supplier Record" = "LFA1 table." A rigid classifier requires retraining when new modules or synonyms appear. Explainability is crucial — consultants need to know why a requirement routed to MM, not just that it did.

✅ Solution

Sentence-BERT + k-NN uses semantic similarity over a living corpus. New modules are added to the index and picked up instantly — no retraining. Every routing comes with a citation: "Routed to FI-CO because it matches requirement #1423 from Merck project (similarity 0.94)." Explainable, auditable, and adaptive.

Detail

🧩 What It Comprises

🤖 Embedding Model

all-mpnet-base-v2 — 768-dimensional sentence embeddings, state-of-the-art for semantic similarity.

🔍 Vector Index

FAISS HNSW — Hierarchical Navigable Small World graph for O(log N) approximate nearest neighbor search.

🗳️ Voting

Weighted k-NN (k=12) — Cosine similarity weighted vote across nearest neighbors.

📋 Corpus

40,000 requirements tagged by module across FI, CO, MM, SD, PP, QM, PM, HCM, WM, EHS.

Detail

📥 Inputs & 📤 Outputs

📥 Inputs

•One requirement line (from Tool 02)
•Optional: industry context

📤 Outputs

•Ranked SAP modules with similarity scores
•Top-3 nearest historical requirements as evidence
•Multi-module flag if above 0.72 threshold for multiple

Detail

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                           SAP MODULE ROUTER PIPELINE                                       │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   INPUT:     │  "Create a report showing stock aging by plant with valuation"         │
│   │ Requirement  │                                                                        │
│   │    Text      │                                                                        │
│   └──────┬───────┘                                                                        │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: EMBED REQUIREMENT                        │                    │
│   │                                                                     │                    │
│   │   all-mpnet-base-v2 Sentence Transformer                            │                    │
│   │   Input: "Create a report showing stock aging by plant..."          │                    │
│   │   Output: [0.12, -0.45, 0.89, 0.03, ...] (768 dimensions)          │                    │
│   │                                                                     │                    │
│   │   Cosine Similarity later: sim(A,B) = (A·B)/(‖A‖·‖B‖)              │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: APPROXIMATE NEAREST NEIGHBOR SEARCH      │                    │
│   │                                                                     │                    │
│   │   FAISS HNSW Index (40,000 labeled requirements)                    │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │                    │
│   │   │  HNSW Graph Structure:                                       │  │                    │
│   │   │                                                              │  │                    │
│   │   │       Layer 2 (sparse) ─── long-range "highway" edges        │  │                    │
│   │   │            │                                                 │  │                    │
│   │   │       Layer 1 (medium) ─── regional connections              │  │                    │
│   │   │            │                                                 │  │                    │
│   │   │       Layer 0 (dense)  ─── local neighborhood                │  │                    │
│   │   │                                                              │  │                    │
│   │   │  Search: O(log N) ≈ 15 distance computations                 │  │                    │
│   │   │  vs. brute-force: O(N) = 40,000 computations                 │  │                    │
│   │   └─────────────────────────────────────────────────────────────┘  │                    │
│   │                                                                     │                    │
│   │   Returns: Top-12 most similar requirements                         │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: WEIGHTED k-NN VOTING                     │                    │
│   │                                                                     │                    │
│   │   For each module m in {FI, CO, MM, SD, PP, ...}:                   │                    │
│   │                                                                     │                    │
│   │   Score(m) = Σ_{i∈kNN} [ similarity(query, neighbor_i) × 1_{module(neighbor_i)=m} ]  │                    │
│   │                                                                     │                    │
│   │   ┌────────────────────────────────────────────────────────────┐   │                    │
│   │   │  Neighbor 1: sim=0.94, module=MM  → MM += 0.94             │   │                    │
│   │   │  Neighbor 2: sim=0.91, module=MM  → MM += 0.91             │   │                    │
│   │   │  Neighbor 3: sim=0.88, module=WM  → WM += 0.88             │   │                    │
│   │   │  Neighbor 4: sim=0.85, module=MM  → MM += 0.85             │   │                    │
│   │   │  ...                                                        │   │                    │
│   │   │  Final: MM=4.82, WM=1.23, SD=0.45, FI=0.31                 │   │                    │
│   │   └────────────────────────────────────────────────────────────┘   │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: THRESHOLD & OUTPUT                       │                    │
│   │                                                                     │                    │
│   │   • Primary module: MM (score 4.82)                                 │                    │
│   │   • Secondary modules: WM (1.23) if > 0.72 threshold                │                    │
│   │   • Evidence: Top-3 neighbors with citations                        │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │   OUTPUT JSON:                                                    │                    │
│   │   {                                                               │                    │
│   │     "primary_module": "MM",                                       │                    │
│   │     "confidence": 0.94,                                           │                    │
│   │     "all_scores": {"MM":0.94, "WM":0.82, "SD":0.31},              │                    │
│   │     "evidence": [                                                 │                    │
│   │       {"req": "Stock aging report MB5B...", "project": "Merck",   │                    │
│   │        "similarity": 0.94, "module": "MM"}                        │                    │
│   │     ]                                                             │                    │
│   │   }                                                               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Detail

🏗️ Architecture & Integration

Where Module Router Sits in A²AI

🏷️ TOOL 02
Tagged Requirements

→

🧭 TOOL 03
Module Router
SBERT + k-NN

→

📋 Module Assignment
FI, CO, MM, SD, PP...

↓ Feeds Into ↓

TOOL 04
Cost Forecaster

TOOL 08
RICEFW Classifier

Team Assignment
Workflow Routing

Tool 03 is the ROUTING LAYER — assigns every requirement to the correct functional team and downstream estimator.

Detail

📐 Mathematical Explanation

Sentence Embedding (MPNet):

h = MPNet( [CLS] + tokens + [SEP] ) ∈ ℝ⁷⁶⁸

Cosine Similarity:

sim(q, d) = (q · d) / (‖q‖ · ‖d‖) = Σ(q_i · d_i) / √(Σ q_i² · Σ d_i²)

Weighted k-NN Score for Module m:

Score(m) = Σ_{i=1}^{k} sim(q, n_i) × 𝕀[module(n_i) = m]

Where 𝕀 is the indicator function, k=12.

HNSW Search Complexity:

O(log N) expected search time with M=16 (max connections per node)
vs. brute-force O(N) = 40,000 distance computations

Multi-Module Threshold:

Modules = {m | Score(m) / max_score > 0.72}

This allows fuzzy boundaries like Transfer Pricing → [FI, CO].

Detail

📊 Measured Performance

Metric	Value	Benchmark
Top-1 Accuracy	95.1%	800 held-out requirements
Top-3 Accuracy	98.4%	Same test set
Mean Reciprocal Rank (MRR)	0.967	Industry standard
Search Latency (p99)	12ms	FAISS HNSW on CPU
Corpus Size	40,000	Labeled requirements

Detail

📚 Training & Calibration Set

•Corpus Size: 40,000 requirements
•Module Distribution: FI (28%), CO (18%), MM (22%), SD (15%), PP (8%), Others (9%)
•Source: 42 delivered SAP projects across all major industries
•Annotation: Module tags assigned by senior functional consultants
•Embedding Model: all-mpnet-base-v2 (pre-trained, not fine-tuned)
•Index Rebuild: Weekly as new projects complete

Detail

🎬 End-to-End Example

Scenario: Ambiguous Requirement Routing

•Input: "Create intercompany billing process with transfer pricing markup"
•Embedding: MPNet generates 768-dim vector
•k-NN Search: Finds 12 similar past requirements
•Voting Results: SD=3.82, FI=3.45, CO=2.98 (all above threshold)
•Output: Primary: SD (Sales & Distribution); Secondary: FI, CO
•Evidence: "Matches requirement #8921 from Pfizer project (SD, 0.91) and #3401 from Nestlé project (FI-CO, 0.88)"
•Action: Requirement routed to SD team lead with FI-CO cc'd

Result: Correct multi-module routing with auditable justification. Feeds Tool 04 with module-specific cost factors.