Tool 12 · Algorithm Deep Dive
Dense Retrieval + FAISS Vector Search
Two shapes of recall matter: keyword precision ('Ariba migration 2024') and semantic understanding ('something like ours but in CPG'). Pure keyword search misses synonyms; pure semantic search misses exact matches. Consultants need both to find relevant analogues for estimation.
Hybrid Search: BM25 for keyword precision + Dense embeddings (bge-large) for semantic recall. Reciprocal Rank Fusion combines rankings. Metadata filters (industry, size, region, year) ensure relevance. Every retrieved project includes actual outcomes for benchmarking.
Okapi BM25 over project briefs, scope documents, and lessons learned. Handles exact keyword matches.
bge-large-en-v1.5 embeddings (1,024-dim) stored in pgvector. Semantic similarity for conceptual matches.
Combines BM25 and dense rankings: RRF_score = 1/(k+rank₁) + 1/(k+rank₂).
Industry, company size, region, year, SAP modules, project duration, budget range.
Industry: Pharma | Size: $8.2M | Duration: 14 months
Outcome: Delivered on budget, 2-week delay due to data quality
Key Risk: Legacy data unmapped → +15% contingency used
Industry: Pharma | Size: $6.5M | Duration: 12 months
Outcome: Under budget by 8%, on time
Key Success: Strong data governance from Day 1
Industry: Pharma/CPG | Size: $12.1M | Duration: 18 months
Outcome: 10% over budget, 3-month delay (scope creep)
Lesson: Freeze scope earlier; add change order buffer
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ HISTORICAL PROJECT RETRIEVER PIPELINE │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ INPUT: │ "S/4HANA migration for pharma company with complex supply chain" │
│ │ Query │ Filters: Industry=Pharma, Size>$5M │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: PARALLEL RETRIEVAL │ │
│ │ │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ DENSE RETRIEVAL │ │ BM25 KEYWORD RETRIEVAL │ │ │
│ │ │ (bge-large) │ │ (Okapi BM25) │ │ │
│ │ │ │ │ │ │ │
│ │ │ Query → Embedding │ │ Query → Tokenize │ │ │
│ │ │ Cosine sim to projects │ │ IDF-weighted term matching │ │ │
│ │ │ │ │ │ │ │
│ │ │ Top-50 candidates │ │ Top-50 candidates │ │ │
│ │ │ (semantic similarity) │ │ (keyword precision) │ │ │
│ │ └───────────┬─────────────┘ └───────────────┬─────────────┘ │ │
│ │ │ │ │ │
│ │ └─────────────┬─────────────────────┘ │ │
│ │ ▼ │ │
│ │ Reciprocal Rank Fusion (RRF) │ │
│ │ Score = Σ 1/(60 + rank_i) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Combined Top-50 │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: METADATA FILTERING │ │
│ │ │ │
│ │ Apply strict filters: │ │
│ │ • Industry = Pharma │ │
│ │ • Budget > $5M │ │
│ │ • Year ≥ 2022 │ │
│ │ │ │
│ │ Soft filters (penalize, don't exclude): │ │
│ │ • Region = North America (preferred) │ │
│ │ • Modules = FI, CO, MM (required) │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: OUTCOME ENRICHMENT │ │
│ │ │ │
│ │ For each retrieved project, attach: │ │
│ │ • Actual vs. estimated cost │ │
│ │ • Actual vs. estimated timeline │ │
│ │ • Risk materialization flags │ │
│ │ • Lessons learned summary │ │
│ │ • Key success factors │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ OUTPUT: │ │
│ │ { │ │
│ │ "query": "S/4HANA migration pharma supply chain", │ │
│ │ "analogues": [ │ │
│ │ {"project": "Pfizer S/4HANA Finance", "similarity": 0.94, │ │
│ │ "outcome": {"cost_variance": "+2%", "time_variance": "+2w"}}│ │
│ │ ], │ │
│ │ "benchmark_summary": "Avg cost variance: +3.2%" │ │
│ │ } │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
Tool 12 provides the "institutional memory" for evidence-based estimation.
| Metric | Value | Benchmark |
|---|---|---|
| Precision@10 | 0.82 | Consultant-rated relevance |
| Mean Reciprocal Rank (MRR) | 0.78 | First relevant result position |
| Recall@10 | 0.71 | % of all relevant projects found |
| Query Latency | 45ms | Hybrid search + fusion |
| Index Size | 42 projects | Growing with each delivery |
Result: P50 estimate calibrated to $8.2M based on similar projects; actual delivered at $8.5M.