Home / Architecture / Tool 06 Demo / Algorithm Detail

Tool 06 · Algorithm Deep Dive

RFP Intelligence Chat

RAG + Sentence-BERT + GPT-4o

96.7%Faithfulness
99.1%No-Hallucination
2.4sMedian Latency
3,072Embedding Dims
Try Interactive Demo

🎯 Why This Algorithm

📋 Problem Statement

Stakeholders don't read 300-page RFPs — they ask questions. But generic LLMs (ChatGPT, Claude without RAG) hallucinate under pressure. Asking "Does this RFP include S/4HANA Private Cloud?" could produce a confident but wrong answer based on outdated training data. In SAP consulting, a wrong answer about licensing costs $200,000+.

✅ Solution

RAG (Retrieval-Augmented Generation) forces grounded answers. Every sentence the model writes is pinned to a retrieved source passage from the actual RFP. Hybrid search (BM25 + dense) ensures both keyword precision and semantic recall. A cross-encoder re-ranker verifies relevance. The system refuses to answer if confidence is low.

🧩 What It Comprises

🔤 Embeddings

OpenAI text-embedding-3-large — 3,072-dimensional vectors stored in pgvector on PostgreSQL.

🔍 Retriever

Hybrid Search — Top-30 from dense vector similarity + BM25 keyword matching. Reciprocal rank fusion combines results.

🎯 Re-ranker

BGE-reranker-v2-m3 — Cross-encoder that scores query-document pairs, narrowing top-30 to top-6.

🤖 Generator

Claude 3.5 Sonnet with strict 'cite or decline' system prompt. No-answer detector refuses low-confidence queries.

📥 Inputs & 📤 Outputs

📥 Inputs

  • Natural-language question
  • Scoped to one RFP + attachments
  • Optional: conversation history

📤 Outputs

  • Answer with inline citations [§, page]
  • Confidence tag (high/med/low)
  • Retrieved source snippets
  • "I don't know" response when appropriate

💬 Example Interaction

👤 Does this RFP require S/4HANA Private Cloud Edition?
🤖 Yes. Section 4.2.1 states: "The solution must be deployed on SAP S/4HANA Cloud, Private Edition, with customer-managed encryption keys." [§4.2.1, p.47]

Additionally, Appendix B specifies: "Private Cloud subscription licensing model preferred." [App B, p.312]

Confidence: High
👤 What's the expected go-live date?
🤖 I cannot find a specific go-live date in this RFP. The document mentions "target completion Q4 2026" but does not specify an exact date. [§1.3, p.8]

Confidence: Medium

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              RFP CHAT — RAG PIPELINE                                       │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   USER       │  "Does this RFP require S/4HANA Private Cloud Edition?"                │
│   │   QUERY      │                                                                        │
│   └──────┬───────┘                                                                        │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: DOCUMENT CHUNKING (Pre-processing)      │                    │
│   │                                                                     │                    │
│   │   RFP → 400-token chunks with 50-token overlap                      │                    │
│   │   Each chunk embedded (3,072-dim) and stored in pgvector            │                    │
│   │   Metadata: page number, section header, confidence                 │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: HYBRID RETRIEVAL (Top-30)                │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────┐    ┌─────────────────────────────┐   │                    │
│   │   │   DENSE RETRIEVAL       │    │   BM25 KEYWORD RETRIEVAL     │   │                    │
│   │   │                         │    │                              │   │                    │
│   │   │  Query → Embedding      │    │  Query → Tokenize → IDF      │   │                    │
│   │   │  Cosine sim to chunks   │    │  Match against inverted idx  │   │                    │
│   │   │  Top-50 candidates      │    │  Top-50 candidates           │   │                    │
│   │   └───────────┬─────────────┘    └───────────────┬─────────────┘   │                    │
│   │               │                                   │                  │                    │
│   │               └─────────────┬─────────────────────┘                  │                    │
│   │                             ▼                                        │                    │
│   │              Reciprocal Rank Fusion (RRF)                             │                    │
│   │              Score = 1/(k + rank_dense) + 1/(k + rank_bm25)          │                    │
│   │                             │                                        │                    │
│   │                             ▼                                        │                    │
│   │                        Top-30 Candidates                              │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: CROSS-ENCODER RE-RANKING (Top-6)         │                    │
│   │                                                                     │                    │
│   │   BGE-reranker-v2-m3 Cross-Encoder:                                 │                    │
│   │                                                                     │                    │
│   │   For each of 30 candidates:                                        │                    │
│   │   Input = [CLS] Query [SEP] Chunk [SEP]                            │                    │
│   │   Output = Relevance Score (0-1)                                    │                    │
│   │                                                                     │                    │
│   │   Sort by relevance → Keep top-6                                    │                    │
│   │   Filter: min_relevance > 0.35 else decline to answer               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: GROUNDED GENERATION                       │                    │
│   │                                                                     │                    │
│   │   Claude 3.5 Sonnet with Strict System Prompt:                       │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │                    │
│   │   │  "You are answering questions about an RFP document.         │  │                    │
│   │   │   Answer ONLY using the provided context chunks.             │  │                    │
│   │   │   Cite the page number for every factual claim.              │  │                    │
│   │   │   If the answer is not in the context, say 'I cannot find    │  │                    │
│   │   │   this information in the RFP.'                              │  │                    │
│   │   │   DO NOT use outside knowledge or make assumptions."         │  │                    │
│   │   └─────────────────────────────────────────────────────────────┘  │                    │
│   │                                                                     │                    │
│   │   Context Chunks (Top-6) + Query → Claude → Cited Answer            │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 5: FAITHFULNESS CHECK & OUTPUT              │                    │
│   │                                                                     │                    │
│   │   Post-generation verification:                                     │                    │
│   │   • Are all factual claims supported by cited chunks?               │                    │
│   │   • Do citations match actual page numbers?                         │                    │
│   │   • Confidence classification (High/Med/Low)                        │                    │
│   │                                                                     │                    │
│   │   Output: Answer + Citations + Confidence                           │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘
                    

🏗️ Architecture & Integration

Where RFP Chat Sits in A²AI

📄 TOOL 01
Document Intelligence
📚 Vector Index
pgvector + chunks
💬 TOOL 06
RFP Chat
RAG + Claude 3.5
User Interface
Chat Widget
Slack/Teams Bot
Integration
Audit Log
Q&A History

Tool 06 is the primary interactive interface for consultants exploring RFP content.

🔁 Multi-LLM Fallback Architecture (Recommended Enhancement)

Query
Primary: Claude 3.5 Sonnet
Best accuracy
↓ (if error or latency > 3s)
Fallback 1: GPT-4o-mini
OpenAI API
↓ (if error)
Fallback 2: Llama 3.1 70B
Self-hosted AWS Bedrock (EU)

Provides resilience, GDPR compliance (EU data residency), and cost optimization.

📐 Mathematical Explanation

Dense Retrieval (Cosine Similarity):

sim(q, d) = (q · d) / (‖q‖ · ‖d‖)

BM25 Keyword Scoring:

BM25(q, d) = Σ IDF(q_i) · [f(q_i, d) · (k₁+1)] / [f(q_i, d) + k₁·(1-b+b·|d|/avgdl)]

Where k₁=1.5, b=0.75 (standard parameters).

Reciprocal Rank Fusion (RRF):

RRF_score(d) = Σ_{r∈R} 1 / (k + rank_r(d))

Where k=60 (smoothing constant), R = {dense_ranking, bm25_ranking}.

Cross-Encoder Relevance:

P(relevant | q, d) = σ( W · BERT([CLS] q [SEP] d [SEP]) + b )

Faithfulness Metric (RAGAS):

Faithfulness = |{claims in answer supported by context}| / |{total claims in answer}|

📊 Measured Performance

MetricValueBenchmark
Faithfulness (RAGAS)96.7%600 Q&A pairs
No-Hallucination Rate99.1%Answer fully grounded in context
Answer Relevance0.91RAGAS relevance score
Context Recall0.94RAGAS recall metric
Median Latency2.4sEnd-to-end
Refusal Rate (appropriate)8.2%"I don't know" when warranted

📚 Training & Calibration Set

  • Generator: Claude 3.5 Sonnet (no fine-tuning — uses system prompt)
  • Re-ranker: BGE-reranker-v2-m3 fine-tuned on 8,000 RFP Q&A pairs
  • Embeddings: text-embedding-3-large (pre-trained, not fine-tuned)
  • Evaluation Set: 600 Q&A pairs across 12 RFPs, triple-annotated
  • Chunking Strategy: 400 tokens, 50-token overlap, section-aware boundaries

🎬 End-to-End Example

Scenario: Complex RFP Q&A Session

  1. Pre-processing: 450-page Pharma RFP chunked into 1,847 chunks, embedded, indexed
  2. User Query: "What are the GxP validation requirements for the MM module?"
  3. Hybrid Retrieval: Dense + BM25 → Top-30 candidates
  4. Re-ranking: Cross-encoder selects top-6 most relevant chunks
  5. Generation: Claude produces cited answer from §7.3.2 and Appendix D
  6. Output: Answer with 3 citations, High confidence

Result: Consultant gets accurate answer in 2.4s vs. 45 minutes of manual searching.