Tool 01 · Algorithm Deep Dive
LayoutLMv3 + Azure Document Intelligence
RFPs arrive as scanned PDFs, native DOCX, tables, and email threads — a pure OCR model misses structure, and a pure text parser misses scans. A table header on page 14 and its continuation on page 15 get processed as two separate, disconnected entities. Critical scope items get lost or misclassified.
LayoutLMv3 fuses text + layout + image patches in a single multimodal transformer. It understands that text aligned in columns with borders is a table, that bold text at the top is a heading, and that indented bullet points form a list. Tables, headers, and list items land in the right spot.
LayoutLMv3 — multimodal transformer, 133M parameters. Fine-tuned on 3,200 SAP RFP pages for section/table/form-field classification.
Azure Document Intelligence — pre-reads for OCR confidence and initial text extraction for scanned pages.
unstructured.io + PyMuPDF — deterministic fallback for native digital files (DOCX, text-based PDFs).
Lightweight classifier that routes: Scanned Image → OCR Path; Digital Text → Fast Parse; Complex Table → Table Transformer.
┌─────────────────────────────────────────────────────────────────────────────────────────┐
│ DOCUMENT INTELLIGENCE PIPELINE │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────────┐ │
│ │ INPUT: │ PDF, DOCX, EML, XLSX, PNG/JPG (Max 500 pages) │
│ │ Document │ │
│ └──────┬───────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 1: TRIAGE ROUTER │ │
│ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │
│ │ │ Scanned Image? │───▶│ Digital Text? │───▶│ Complex Table? │ │ │
│ │ │ → OCR Path │ │ → Fast Parse │ │ → Table Transf. │ │ │
│ │ │ (Azure Doc Intel)│ │ (PyMuPDF) │ │ (Specialized) │ │ │
│ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 2: LAYOUT ANALYSIS (LayoutLMv3) │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────────┐ │ │
│ │ │ Multimodal Transformer (133M params) │ │ │
│ │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │
│ │ │ │ Text │ + │ Layout │ + │ Visual │ │ │ │
│ │ │ │Embeddings│ │Embeddings│ │Embeddings│ │ │ │
│ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ │ │ │
│ │ │ └───────────────┼───────────────┘ │ │ │
│ │ │ ▼ │ │ │
│ │ │ Unified Multi-Modal Attention │ │ │
│ │ │ ▼ │ │ │
│ │ │ Tags: [HEADING] [PARAGRAPH] [TABLE] [LIST] [FIGURE] │ │ │
│ │ └─────────────────────────────────────────────────────────────┘ │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 3: NORMALIZE │ │
│ │ • Merge fragmented text blocks │ │
│ │ • Reconstruct tables into JSON structure (rows, cells) │ │
│ │ • Preserve reading order │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────────────────┐ │
│ │ STEP 4: EMIT │ │
│ │ { │ │
│ │ "page": 14, │ │
│ │ "blocks": [ │ │
│ │ {"type": "heading", "text": "4.2 Scope of Work", ...}, │ │
│ │ {"type": "table", "rows": [...], "confidence": 0.96, ...} │ │
│ │ ] │ │
│ │ } │ │
│ └──────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────────────────┘
Tool 01 is the ENTRY POINT of the entire A²AI pipeline. No downstream tool can function without clean, structured input from Document Intelligence.
| Metric | Value | Benchmark Dataset |
|---|---|---|
| Macro-F1 (Overall) | 94.2% | 1,200 held-out SAP RFP pages |
| Table Cell Extraction (Exact Match) | 91.6% | Internal RFP benchmark |
| Section Header Classification | 96.8% | Multi-level heading hierarchy test |
| List Item Detection | 93.2% | Bulleted and numbered lists |
RFP_Pharma_S4_Migration.pdf (450 pages, mixed scanned/digital)Result: 40+ hours of manual document triage reduced to under 2 minutes.