Project Risk Estimator — Algorithm Deep Dive

Detail

🎯 Why This Algorithm

📋 Problem Statement

SAP project risk is driven by non-linear interactions — a volatile scope is fine if the team is senior, but deadly if the team is junior. Standard linear models miss these interaction effects. Moreover, "black box" AI predictions are useless in consulting — you cannot tell a client "The AI said no, sorry." Every risk score must be fully explainable.

✅ Solution

LightGBM gradient-boosted trees capture complex feature interactions without manual feature engineering. SHAP (SHapley Additive exPlanations) provides mathematically guaranteed, game-theoretic explanations showing exactly which factors drive risk. Isotonic regression calibrates scores to true probabilities.

Detail

🧩 What It Comprises

🌲 Core Model

LightGBM Regressor — 800 trees, max-depth=6, trained on 47 engineered features capturing scope breadth, team composition, data quality gaps, integration surface, compliance complexity, and client change history.

🔮 SHAP Explainer

TreeSHAP — Exact Shapley values for tree ensembles. Computes each feature's marginal contribution to the risk score across all possible feature coalitions.

📏 Calibration

Isotonic Regression — Non-parametric calibration mapping raw scores to empirically observed risk probabilities.

📋 Risk Templates

30+ predefined risk types across Scope, Team, Data, Integration, Compliance, and Timeline dimensions.

Detail

📥 Inputs & 📤 Outputs

📥 Inputs

•Engagement drivers (scope, team, data, compliance, integrations)
•Nearest-neighbor historical projects (from Tool 12)
•Scope volatility indicators
•Team ramp-up index
•Data quality gap score

📤 Outputs

•Likelihood × Impact score per risk type (0–100)
•Overall project risk score
•SHAP value per feature (explanation)
•Recommended mitigation actions
•Top-3 risk drivers with percentages

📊 Risk Score Visualization

Risk Score: 72/100 (Elevated)

SHAP Explanation — Why 72?

Unmapped Legacy Data Sources

+42 points

Junior-heavy Team Composition

+28 points

Tight Timeline (< 6 months)

+15 points

7+ Integration Points

+13 points

Detail

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                         PROJECT RISK ESTIMATOR PIPELINE                                    │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                              STEP 1: FEATURE VECTOR BUILD                          │   │
│   │                                                                                   │   │
│   │   Inputs from Tools 02, 03, 10, 12 → 47-dimensional feature vector:               │   │
│   │                                                                                   │   │
│   │   ┌─────────────────┬─────────────────┬─────────────────┬─────────────────┐       │   │
│   │   │ Scope Features  │ Team Features   │ Data Features   │ Integration     │       │   │
│   │   ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤       │   │
│   │   │ Breadth: 142    │ Senior: 4       │ Quality: 0.62   │ Count: 7        │       │   │
│   │   │ Volatility: 0.3 │ Junior: 3       │ Unmapped: 12    │ Complexity: 8.4 │       │   │
│   │   │ Ambiguity: 0.4  │ Ramp: 0.7       │ Gaps: 8         │ Legacy: 5       │       │   │
│   │   └─────────────────┴─────────────────┴─────────────────┴─────────────────┘       │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                    STEP 2: LIGHTGBM SCORING (per risk type)                        │   │
│   │                                                                                   │   │
│   │   For each of 30+ risk templates:                                                 │   │
│   │                                                                                   │   │
│   │   ┌───────────────────────────────────────────────────────────────────────────┐  │   │
│   │   │                      LightGBM Ensemble (800 trees)                          │  │   │
│   │   │                                                                            │  │   │
│   │   │   Tree 1          Tree 2          Tree 3      ...      Tree 800           │  │   │
│   │   │   ┌────┐          ┌────┐          ┌────┐               ┌────┐             │  │   │
│   │   │   │Root│          │Root│          │Root│               │Root│             │  │   │
│   │   │   └┬──┬┘          └┬──┬┘          └┬──┬┘               └┬──┬┘             │  │   │
│   │   │    │   │            │   │            │   │                 │   │              │  │   │
│   │   │   Data Team        Int  Scope       Jr  Sr              Scope Time         │  │   │
│   │   │   Qual Size        Cnt  Breadth      Ct  Ct              Vol   Line         │  │   │
│   │   │                                                                            │  │   │
│   │   │   Raw Risk Score = (1/800) Σ leaf_values                                    │  │   │
│   │   └───────────────────────────────────────────────────────────────────────────┘  │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                    STEP 3: SHAP EXPLANATIONS                                       │   │
│   │                                                                                   │   │
│   │   TreeSHAP computes exact Shapley values:                                          │   │
│   │                                                                                   │   │
│   │   φ_i = Σ_{S⊆N\{i}} [|S|!(|N|-|S|-1)! / |N|!] × [f(S∪{i}) - f(S)]                │   │
│   │                                                                                   │   │
│   │   ┌─────────────────────────────────────────────────────────────────────────┐    │   │
│   │   │  Base Value (average prediction): 35                                      │    │   │
│   │   │                                                                          │    │   │
│   │   │  f() = 35                                                                 │    │   │
│   │   │  f({Data}) = 35 + 42 = 77  → Data contributes +42                         │    │   │
│   │   │  f({Data, Team}) = 77 + 28 = 105 → Team contributes +28                   │    │   │
│   │   │  f({Data, Team, Time}) = 105 + 15 = 120 → Time contributes +15            │    │   │
│   │   │  ...                                                                      │    │   │
│   │   │  Final = 72 (scaled to 0-100)                                             │    │   │
│   │   └─────────────────────────────────────────────────────────────────────────┘    │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │                    STEP 4: CALIBRATION & MITIGATION                                │   │
│   │                                                                                   │   │
│   │   Isotonic Regression Calibration:                                                 │   │
│   │   P(actual_risk | raw_score) = isotonic(raw_score)                                 │   │
│   │                                                                                   │   │
│   │   Rules Engine: Top SHAP drivers → Mitigation recommendations                       │   │
│   │                                                                                   │   │
│   │   ┌─────────────────────────────────────────────────────────────────────────┐    │   │
│   │   │  Driver: "Unmapped Legacy Data Sources" (+42)                            │    │   │
│   │   │  → Mitigation: Add 2-week data discovery phase                           │    │   │
│   │   │  → Mitigation: Engage legacy system SME for 25% allocation                │    │   │
│   │   │                                                                          │    │   │
│   │   │  Driver: "Junior-heavy Team Composition" (+28)                           │    │   │
│   │   │  → Mitigation: Add 1 Senior Architect at 50% allocation                   │    │   │
│   │   │  → Mitigation: Schedule weekly design reviews                             │    │   │
│   │   └─────────────────────────────────────────────────────────────────────────┘    │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌─────────────────────────────────────────────────────────────────────────────────┐   │
│   │   OUTPUT JSON:                                                                    │   │
│   │   {                                                                               │   │
│   │     "overall_risk": 72,                                                           │   │
│   │     "risk_breakdown": [                                                            │   │
│   │       {"type": "Data Quality", "score": 85, "shap": 42},                          │   │
│   │       {"type": "Team Composition", "score": 78, "shap": 28},                       │   │
│   │       {"type": "Timeline", "score": 65, "shap": 15},                               │   │
│   │       {"type": "Integration", "score": 58, "shap": 13}                             │   │
│   │     ],                                                                            │   │
│   │     "mitigations": ["Add data discovery phase", "Add Senior Architect", ...]      │   │
│   │   }                                                                               │   │
│   └─────────────────────────────────────────────────────────────────────────────────┘   │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Detail

🏗️ Architecture & Integration

Where Risk Estimator Sits in A²AI

🏷️ TOOL 02
Requirements

🧭 TOOL 03
Modules

🕸️ TOOL 10
Change Impact

🕰️ TOOL 12
Historical

↓

⚠️ TOOL 05
Project Risk Estimator
LightGBM + SHAP

↓

Executive Dashboard
Risk Heatmap

Mitigation Plan
Action Items

TOOL 04
Contingency Buffer

Tool 05 feeds contingency recommendations directly into Tool 04's P90 calculations.

Detail

📐 Mathematical Explanation

LightGBM Gradient Boosting:

F_m(x) = F_{m-1}(x) + η · h_m(x)

Where h_m is the tree fitted to the negative gradient of the loss function.

SHAP Value (Exact for Tree Ensembles):

φ_i = Σ_{S⊆N\{i}} [|S|!(|N|-|S|-1)! / |N|!] × [f_x(S∪{i}) - f_x(S)]

Properties of SHAP:
• Local Accuracy: f(x) = φ_0 + Σ φ_i
• Missingness: Feature absent → φ_i = 0
• Consistency: If feature contribution increases, φ_i never decreases

Isotonic Regression Calibration:

min_{ŷ_1 ≤ ŷ_2 ≤ ... ≤ ŷ_n} Σ (y_i - ŷ_i)²

Subject to monotonicity constraint (preserves ranking).

Brier Score (Calibration Metric):

BS = (1/N) Σ (p_i - o_i)²

Where p_i is predicted probability, o_i is actual outcome (0/1). Lower is better.

Detail

📊 Measured Performance

Metric	Value	Benchmark
ROC-AUC	0.89	42 delivered projects (risk materialization)
Brier Score	0.094	Calibrated probability accuracy
Precision @ 80th percentile	0.83	High-risk project identification
Recall @ 80th percentile	0.79	High-risk project identification
SHAP Explanation Fidelity	0.94	Correlation with actual outcomes

Detail

📚 Training & Calibration Set

•Projects: 42 delivered SAP engagements (2022–2025)
•Labels: Risk materialization ground truth labeled by delivery leads (binary + severity)
•Features: 47 engineered features across 6 dimensions
•Validation: 5-fold stratified cross-validation
•Calibration: Isotonic regression on held-out predictions
•Retrain Schedule: Weekly as new project outcomes recorded

Detail

🎬 End-to-End Example

Scenario: High-Risk Pharma S/4HANA Migration

•Input: 142 requirements, 7 integrations, junior-heavy team, 12 unmapped legacy data sources
•LightGBM: Scores Data Quality risk at 85, Team risk at 78, Overall 72
•SHAP: Reveals "Unmapped Legacy Data Sources" contributing 42 points to overall risk
•Mitigation Engine: Recommends 2-week data discovery phase and Senior Architect allocation
•Output: Risk report with prioritized mitigations
•Downstream: Tool 04 adds 15% contingency buffer based on risk score

Result: Project delivered successfully; data discovery phase prevented 6-week UAT delay.