Tool 06 · Algorithm Deep Dive
RAG + Sentence-BERT + GPT-4o
Stakeholders don't read 300-page RFPs — they ask questions. But generic LLMs (ChatGPT, Claude without RAG) hallucinate under pressure. Asking "Does this RFP include S/4HANA Private Cloud?" could produce a confident but wrong answer based on outdated training data. In SAP consulting, a wrong answer about licensing costs $200,000+.
RAG (Retrieval-Augmented Generation) forces grounded answers. Every sentence the model writes is pinned to a retrieved source passage from the actual RFP. Hybrid search (BM25 + dense) ensures both keyword precision and semantic recall. A cross-encoder re-ranker verifies relevance. The system refuses to answer if confidence is low.
OpenAI text-embedding-3-large — 3,072-dimensional vectors stored in pgvector on PostgreSQL.
Hybrid Search — Top-30 from dense vector similarity + BM25 keyword matching. Reciprocal rank fusion combines results.
BGE-reranker-v2-m3 — Cross-encoder that scores query-document pairs, narrowing top-30 to top-6.
Claude 3.5 Sonnet with strict 'cite or decline' system prompt. No-answer detector refuses low-confidence queries.
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ RFP CHAT — RAG PIPELINE │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ USER │ "Does this RFP require S/4HANA Private Cloud Edition?" │
│ │ QUERY │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: DOCUMENT CHUNKING (Pre-processing) │ │
│ │ │ │
│ │ RFP → 400-token chunks with 50-token overlap │ │
│ │ Each chunk embedded (3,072-dim) and stored in pgvector │ │
│ │ Metadata: page number, section header, confidence │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: HYBRID RETRIEVAL (Top-30) │ │
│ │ │ │
│ │ ┌─────────────────────────┐ ┌─────────────────────────────┐ │ │
│ │ │ DENSE RETRIEVAL │ │ BM25 KEYWORD RETRIEVAL │ │ │
│ │ │ │ │ │ │ │
│ │ │ Query → Embedding │ │ Query → Tokenize → IDF │ │ │
│ │ │ Cosine sim to chunks │ │ Match against inverted idx │ │ │
│ │ │ Top-50 candidates │ │ Top-50 candidates │ │ │
│ │ └───────────┬─────────────┘ └───────────────┬─────────────┘ │ │
│ │ │ │ │ │
│ │ └─────────────┬─────────────────────┘ │ │
│ │ ▼ │ │
│ │ Reciprocal Rank Fusion (RRF) │ │
│ │ Score = 1/(k + rank_dense) + 1/(k + rank_bm25) │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ Top-30 Candidates │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: CROSS-ENCODER RE-RANKING (Top-6) │ │
│ │ │ │
│ │ BGE-reranker-v2-m3 Cross-Encoder: │ │
│ │ │ │
│ │ For each of 30 candidates: │ │
│ │ Input = [CLS] Query [SEP] Chunk [SEP] │ │
│ │ Output = Relevance Score (0-1) │ │
│ │ │ │
│ │ Sort by relevance → Keep top-6 │ │
│ │ Filter: min_relevance > 0.35 else decline to answer │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 4: GROUNDED GENERATION │ │
│ │ │ │
│ │ Claude 3.5 Sonnet with Strict System Prompt: │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ "You are answering questions about an RFP document. │ │ │
│ │ │ Answer ONLY using the provided context chunks. │ │ │
│ │ │ Cite the page number for every factual claim. │ │ │
│ │ │ If the answer is not in the context, say 'I cannot find │ │ │
│ │ │ this information in the RFP.' │ │ │
│ │ │ DO NOT use outside knowledge or make assumptions." │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ Context Chunks (Top-6) + Query → Claude → Cited Answer │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 5: FAITHFULNESS CHECK & OUTPUT │ │
│ │ │ │
│ │ Post-generation verification: │ │
│ │ • Are all factual claims supported by cited chunks? │ │
│ │ • Do citations match actual page numbers? │ │
│ │ • Confidence classification (High/Med/Low) │ │
│ │ │ │
│ │ Output: Answer + Citations + Confidence │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
Tool 06 is the primary interactive interface for consultants exploring RFP content.
Provides resilience, GDPR compliance (EU data residency), and cost optimization.
| Metric | Value | Benchmark |
|---|---|---|
| Faithfulness (RAGAS) | 96.7% | 600 Q&A pairs |
| No-Hallucination Rate | 99.1% | Answer fully grounded in context |
| Answer Relevance | 0.91 | RAGAS relevance score |
| Context Recall | 0.94 | RAGAS recall metric |
| Median Latency | 2.4s | End-to-end |
| Refusal Rate (appropriate) | 8.2% | "I don't know" when warranted |
Result: Consultant gets accurate answer in 2.4s vs. 45 minutes of manual searching.