Tool 05 · Algorithm Deep Dive
Gradient Boosting + Bayesian Calibration
SAP project risk is driven by non-linear interactions — a volatile scope is fine if the team is senior, but deadly if the team is junior. Standard linear models miss these interaction effects. Moreover, "black box" AI predictions are useless in consulting — you cannot tell a client "The AI said no, sorry." Every risk score must be fully explainable.
LightGBM gradient-boosted trees capture complex feature interactions without manual feature engineering. SHAP (SHapley Additive exPlanations) provides mathematically guaranteed, game-theoretic explanations showing exactly which factors drive risk. Isotonic regression calibrates scores to true probabilities.
LightGBM Regressor — 800 trees, max-depth=6, trained on 47 engineered features capturing scope breadth, team composition, data quality gaps, integration surface, compliance complexity, and client change history.
TreeSHAP — Exact Shapley values for tree ensembles. Computes each feature's marginal contribution to the risk score across all possible feature coalitions.
Isotonic Regression — Non-parametric calibration mapping raw scores to empirically observed risk probabilities.
30+ predefined risk types across Scope, Team, Data, Integration, Compliance, and Timeline dimensions.
Risk Score: 72/100 (Elevated)
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ PROJECT RISK ESTIMATOR PIPELINE │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: FEATURE VECTOR BUILD │ │
│ │ │ │
│ │ Inputs from Tools 02, 03, 10, 12 → 47-dimensional feature vector: │ │
│ │ │ │
│ │ ┌─────────────────┬─────────────────┬─────────────────┬─────────────────┐ │ │
│ │ │ Scope Features │ Team Features │ Data Features │ Integration │ │ │
│ │ ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤ │ │
│ │ │ Breadth: 142 │ Senior: 4 │ Quality: 0.62 │ Count: 7 │ │ │
│ │ │ Volatility: 0.3 │ Junior: 3 │ Unmapped: 12 │ Complexity: 8.4 │ │ │
│ │ │ Ambiguity: 0.4 │ Ramp: 0.7 │ Gaps: 8 │ Legacy: 5 │ │ │
│ │ └─────────────────┴─────────────────┴─────────────────┴─────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: LIGHTGBM SCORING (per risk type) │ │
│ │ │ │
│ │ For each of 30+ risk templates: │ │
│ │ │ │
│ │ ┌───────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ LightGBM Ensemble (800 trees) │ │ │
│ │ │ │ │ │
│ │ │ Tree 1 Tree 2 Tree 3 ... Tree 800 │ │ │
│ │ │ ┌────┐ ┌────┐ ┌────┐ ┌────┐ │ │ │
│ │ │ │Root│ │Root│ │Root│ │Root│ │ │ │
│ │ │ └┬──┬┘ └┬──┬┘ └┬──┬┘ └┬──┬┘ │ │ │
│ │ │ │ │ │ │ │ │ │ │ │ │ │
│ │ │ Data Team Int Scope Jr Sr Scope Time │ │ │
│ │ │ Qual Size Cnt Breadth Ct Ct Vol Line │ │ │
│ │ │ │ │ │
│ │ │ Raw Risk Score = (1/800) Σ leaf_values │ │ │
│ │ └───────────────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: SHAP EXPLANATIONS │ │
│ │ │ │
│ │ TreeSHAP computes exact Shapley values: │ │
│ │ │ │
│ │ φ_i = Σ_{S⊆N\{i}} [|S|!(|N|-|S|-1)! / |N|!] × [f(S∪{i}) - f(S)] │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Base Value (average prediction): 35 │ │ │
│ │ │ │ │ │
│ │ │ f() = 35 │ │ │
│ │ │ f({Data}) = 35 + 42 = 77 → Data contributes +42 │ │ │
│ │ │ f({Data, Team}) = 77 + 28 = 105 → Team contributes +28 │ │ │
│ │ │ f({Data, Team, Time}) = 105 + 15 = 120 → Time contributes +15 │ │ │
│ │ │ ... │ │ │
│ │ │ Final = 72 (scaled to 0-100) │ │ │
│ │ └─────────────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ STEP 4: CALIBRATION & MITIGATION │ │
│ │ │ │
│ │ Isotonic Regression Calibration: │ │
│ │ P(actual_risk | raw_score) = isotonic(raw_score) │ │
│ │ │ │
│ │ Rules Engine: Top SHAP drivers → Mitigation recommendations │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Driver: "Unmapped Legacy Data Sources" (+42) │ │ │
│ │ │ → Mitigation: Add 2-week data discovery phase │ │ │
│ │ │ → Mitigation: Engage legacy system SME for 25% allocation │ │ │
│ │ │ │ │ │
│ │ │ Driver: "Junior-heavy Team Composition" (+28) │ │ │
│ │ │ → Mitigation: Add 1 Senior Architect at 50% allocation │ │ │
│ │ │ → Mitigation: Schedule weekly design reviews │ │ │
│ │ └─────────────────────────────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────────────────┐ │
│ │ OUTPUT JSON: │ │
│ │ { │ │
│ │ "overall_risk": 72, │ │
│ │ "risk_breakdown": [ │ │
│ │ {"type": "Data Quality", "score": 85, "shap": 42}, │ │
│ │ {"type": "Team Composition", "score": 78, "shap": 28}, │ │
│ │ {"type": "Timeline", "score": 65, "shap": 15}, │ │
│ │ {"type": "Integration", "score": 58, "shap": 13} │ │
│ │ ], │ │
│ │ "mitigations": ["Add data discovery phase", "Add Senior Architect", ...] │ │
│ │ } │ │
│ └─────────────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
Tool 05 feeds contingency recommendations directly into Tool 04's P90 calculations.
| Metric | Value | Benchmark |
|---|---|---|
| ROC-AUC | 0.89 | 42 delivered projects (risk materialization) |
| Brier Score | 0.094 | Calibrated probability accuracy |
| Precision @ 80th percentile | 0.83 | High-risk project identification |
| Recall @ 80th percentile | 0.79 | High-risk project identification |
| SHAP Explanation Fidelity | 0.94 | Correlation with actual outcomes |
Result: Project delivered successfully; data discovery phase prevented 6-week UAT delay.