RFP Intelligence Chat — Algorithm Deep Dive

Detail

🎯 Why This Algorithm

📋 Problem Statement

Stakeholders don't read 300-page RFPs — they ask questions. But generic LLMs (ChatGPT, Claude without RAG) hallucinate under pressure. Asking "Does this RFP include S/4HANA Private Cloud?" could produce a confident but wrong answer based on outdated training data. In SAP consulting, a wrong answer about licensing costs $200,000+.

✅ Solution

RAG (Retrieval-Augmented Generation) forces grounded answers. Every sentence the model writes is pinned to a retrieved source passage from the actual RFP. Hybrid search (BM25 + dense) ensures both keyword precision and semantic recall. A cross-encoder re-ranker verifies relevance. The system refuses to answer if confidence is low.

Detail

🧩 What It Comprises

🔤 Embeddings

OpenAI text-embedding-3-large — 3,072-dimensional vectors stored in pgvector on PostgreSQL.

🔍 Retriever

Hybrid Search — Top-30 from dense vector similarity + BM25 keyword matching. Reciprocal rank fusion combines results.

🎯 Re-ranker

BGE-reranker-v2-m3 — Cross-encoder that scores query-document pairs, narrowing top-30 to top-6.

🤖 Generator

Claude 3.5 Sonnet with strict 'cite or decline' system prompt. No-answer detector refuses low-confidence queries.

Detail

📥 Inputs & 📤 Outputs

📥 Inputs

•Natural-language question
•Scoped to one RFP + attachments
•Optional: conversation history

📤 Outputs

•Answer with inline citations [§, page]
•Confidence tag (high/med/low)
•Retrieved source snippets
•"I don't know" response when appropriate

💬 Example Interaction

👤 Does this RFP require S/4HANA Private Cloud Edition?

🤖 Yes. Section 4.2.1 states: "The solution must be deployed on SAP S/4HANA Cloud, Private Edition, with customer-managed encryption keys." [§4.2.1, p.47]

Additionally, Appendix B specifies: "Private Cloud subscription licensing model preferred." [App B, p.312]

Confidence: High

👤 What's the expected go-live date?

🤖 I cannot find a specific go-live date in this RFP. The document mentions "target completion Q4 2026" but does not specify an exact date. [§1.3, p.8]

Confidence: Medium

Detail

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                              RFP CHAT — RAG PIPELINE                                       │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   USER       │  "Does this RFP require S/4HANA Private Cloud Edition?"                │
│   │   QUERY      │                                                                        │
│   └──────┬───────┘                                                                        │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: DOCUMENT CHUNKING (Pre-processing)      │                    │
│   │                                                                     │                    │
│   │   RFP → 400-token chunks with 50-token overlap                      │                    │
│   │   Each chunk embedded (3,072-dim) and stored in pgvector            │                    │
│   │   Metadata: page number, section header, confidence                 │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: HYBRID RETRIEVAL (Top-30)                │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────┐    ┌─────────────────────────────┐   │                    │
│   │   │   DENSE RETRIEVAL       │    │   BM25 KEYWORD RETRIEVAL     │   │                    │
│   │   │                         │    │                              │   │                    │
│   │   │  Query → Embedding      │    │  Query → Tokenize → IDF      │   │                    │
│   │   │  Cosine sim to chunks   │    │  Match against inverted idx  │   │                    │
│   │   │  Top-50 candidates      │    │  Top-50 candidates           │   │                    │
│   │   └───────────┬─────────────┘    └───────────────┬─────────────┘   │                    │
│   │               │                                   │                  │                    │
│   │               └─────────────┬─────────────────────┘                  │                    │
│   │                             ▼                                        │                    │
│   │              Reciprocal Rank Fusion (RRF)                             │                    │
│   │              Score = 1/(k + rank_dense) + 1/(k + rank_bm25)          │                    │
│   │                             │                                        │                    │
│   │                             ▼                                        │                    │
│   │                        Top-30 Candidates                              │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: CROSS-ENCODER RE-RANKING (Top-6)         │                    │
│   │                                                                     │                    │
│   │   BGE-reranker-v2-m3 Cross-Encoder:                                 │                    │
│   │                                                                     │                    │
│   │   For each of 30 candidates:                                        │                    │
│   │   Input = [CLS] Query [SEP] Chunk [SEP]                            │                    │
│   │   Output = Relevance Score (0-1)                                    │                    │
│   │                                                                     │                    │
│   │   Sort by relevance → Keep top-6                                    │                    │
│   │   Filter: min_relevance > 0.35 else decline to answer               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: GROUNDED GENERATION                       │                    │
│   │                                                                     │                    │
│   │   Claude 3.5 Sonnet with Strict System Prompt:                       │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐  │                    │
│   │   │  "You are answering questions about an RFP document.         │  │                    │
│   │   │   Answer ONLY using the provided context chunks.             │  │                    │
│   │   │   Cite the page number for every factual claim.              │  │                    │
│   │   │   If the answer is not in the context, say 'I cannot find    │  │                    │
│   │   │   this information in the RFP.'                              │  │                    │
│   │   │   DO NOT use outside knowledge or make assumptions."         │  │                    │
│   │   └─────────────────────────────────────────────────────────────┘  │                    │
│   │                                                                     │                    │
│   │   Context Chunks (Top-6) + Query → Claude → Cited Answer            │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 5: FAITHFULNESS CHECK & OUTPUT              │                    │
│   │                                                                     │                    │
│   │   Post-generation verification:                                     │                    │
│   │   • Are all factual claims supported by cited chunks?               │                    │
│   │   • Do citations match actual page numbers?                         │                    │
│   │   • Confidence classification (High/Med/Low)                        │                    │
│   │                                                                     │                    │
│   │   Output: Answer + Citations + Confidence                           │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Detail

🏗️ Architecture & Integration

Where RFP Chat Sits in A²AI

📄 TOOL 01
Document Intelligence

→

📚 Vector Index
pgvector + chunks

↓

💬 TOOL 06
RFP Chat
RAG + Claude 3.5

↓

User Interface
Chat Widget

Slack/Teams Bot
Integration

Audit Log
Q&A History

Tool 06 is the primary interactive interface for consultants exploring RFP content.

🔁 Multi-LLM Fallback Architecture (Recommended Enhancement)

Query

↓

Primary: Claude 3.5 Sonnet
Best accuracy

↓ (if error or latency > 3s)

Fallback 1: GPT-4o-mini
OpenAI API

↓ (if error)

Fallback 2: Llama 3.1 70B
Self-hosted AWS Bedrock (EU)

Provides resilience, GDPR compliance (EU data residency), and cost optimization.

Detail

📐 Mathematical Explanation

Dense Retrieval (Cosine Similarity):

sim(q, d) = (q · d) / (‖q‖ · ‖d‖)

BM25 Keyword Scoring:

BM25(q, d) = Σ IDF(q_i) · [f(q_i, d) · (k₁+1)] / [f(q_i, d) + k₁·(1-b+b·|d|/avgdl)]

Where k₁=1.5, b=0.75 (standard parameters).

Reciprocal Rank Fusion (RRF):

RRF_score(d) = Σ_{r∈R} 1 / (k + rank_r(d))

Where k=60 (smoothing constant), R = {dense_ranking, bm25_ranking}.

Cross-Encoder Relevance:

P(relevant | q, d) = σ( W · BERT([CLS] q [SEP] d [SEP]) + b )

Faithfulness Metric (RAGAS):

Faithfulness = |{claims in answer supported by context}| / |{total claims in answer}|

Detail

📊 Measured Performance

Metric	Value	Benchmark
Faithfulness (RAGAS)	96.7%	600 Q&A pairs
No-Hallucination Rate	99.1%	Answer fully grounded in context
Answer Relevance	0.91	RAGAS relevance score
Context Recall	0.94	RAGAS recall metric
Median Latency	2.4s	End-to-end
Refusal Rate (appropriate)	8.2%	"I don't know" when warranted

Detail

📚 Training & Calibration Set

•Generator: Claude 3.5 Sonnet (no fine-tuning — uses system prompt)
•Re-ranker: BGE-reranker-v2-m3 fine-tuned on 8,000 RFP Q&A pairs
•Embeddings: text-embedding-3-large (pre-trained, not fine-tuned)
•Evaluation Set: 600 Q&A pairs across 12 RFPs, triple-annotated
•Chunking Strategy: 400 tokens, 50-token overlap, section-aware boundaries

Detail

🎬 End-to-End Example

Scenario: Complex RFP Q&A Session

•Pre-processing: 450-page Pharma RFP chunked into 1,847 chunks, embedded, indexed
•User Query: "What are the GxP validation requirements for the MM module?"
•Hybrid Retrieval: Dense + BM25 → Top-30 candidates
•Re-ranking: Cross-encoder selects top-6 most relevant chunks
•Generation: Claude produces cited answer from §7.3.2 and Appendix D
•Output: Answer with 3 citations, High confidence

Result: Consultant gets accurate answer in 2.4s vs. 45 minutes of manual searching.