RICEFW Classifier — Algorithm Deep Dive

Detail

🎯 Why This Algorithm

📋 Problem Statement

RICEFW (Reports, Interfaces, Conversions, Enhancements, Forms, Workflows) is SAP's taxonomy for custom development. Classifying objects correctly matters because a Conversion Program costs 3-5× more than a Report. Misclassify it, and your estimate is wrong by hundreds of hours. The taxonomy has clear patterns but limited labeled data.

✅ Solution

Hybrid approach: Rules + XGBoost. A deterministic rules layer handles unambiguous patterns (e.g., "Z_CDS_" → Enhancement). XGBoost handles the ambiguous gray cases using object name embeddings, description text, module context, and requirement text. This maximizes accuracy with limited training data.

📋 RICEFW Taxonomy

R = Report I = Interface C = Conversion E = Enhancement F = Form W = Workflow

Typical effort multipliers: Report (1×), Interface (2.5×), Conversion (3.5×), Enhancement (2×), Form (1.8×), Workflow (3×)

Detail

🧩 What It Comprises

📏 Rules Layer

Deterministic rules handle ~22% of cases: object naming patterns (Z_CDS_ → E), transaction codes (Z* → E), object types in transport (REPT → R, FUNC → I).

🌲 XGBoost Classifier

Multi-class head (6 classes) over 38 features: keyword signals, object pattern features, module context, verb-object n-grams, description embeddings.

⏱️ Effort Regressor

Separate XGBoost regressor predicts effort hours based on RICEFW class, complexity features, and module.

🔍 Active Learning

Low-confidence predictions (< 0.65) flagged for human review, then added to training set.

Detail

📥 Inputs & 📤 Outputs

📥 Inputs

•Custom object metadata (name, description, module)
•Object pattern features
•Related requirement text (if any)
•Transport layer (DEV, CONS, etc.)

📤 Outputs

•RICEFW class with confidence
•Estimated effort hours
•Explanation: top 3 feature contributions
•Human review flag if confidence < 0.65

Detail

🔄 How It Runs — Step by Step

┌─────────────────────────────────────────────────────────────────────────────────────────┐
│                          RICEFW CLASSIFIER PIPELINE                                        │
├─────────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                           │
│   ┌──────────────┐                                                                        │
│   │   INPUT:     │  Object: "Z_CDS_VENDOR_MASTER"                                         │
│   │ Object Meta  │  Description: "Core Data Service for Vendor Master with BOPF"          │
│   └──────┬───────┘  Module: MM (from Tool 03)                                             │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 1: RULES LAYER (Unambiguous Cases)         │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐ │                    │
│   │   │  Rule 1: Name starts with "Z_CDS_" → Enhancement (E)        │ │                    │
│   │   │  Rule 2: Name contains "_FORM_" → Form (F)                  │ │                    │
│   │   │  Rule 3: Transaction starts with "Z" → Enhancement (E)      │ │                    │
│   │   │  Rule 4: Type = "REPT" in transport → Report (R)            │ │                    │
│   │   │  Rule 5: Type = "FUNC" and name has "BAPI" → Interface (I)  │ │                    │
│   │   │  Rule 6: Name has "CONV" or "MIGR" → Conversion (C)         │ │                    │
│   │   │  Rule 7: Description contains "workflow" → Workflow (W)     │ │                    │
│   │   └─────────────────────────────────────────────────────────────┘ │                    │
│   │                                                                     │                    │
│   │   If match found → STOP, return class with confidence=1.0           │                    │
│   │   Else → Continue to XGBoost                                        │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │ (if no rule match)                                                             │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 2: FEATURE ENGINEERING                      │                    │
│   │                                                                     │                    │
│   │   Build 38-dimensional feature vector:                              │                    │
│   │   • Object name n-grams (character-level, 3-5 grams)                │                    │
│   │   • Description TF-IDF vector (top 100 terms)                       │                    │
│   │   • Module one-hot encoding (FI, CO, MM, SD, PP, etc.)              │                    │
│   │   • Keyword presence: "report", "interface", "bapi", "idoc",        │                    │
│   │     "conversion", "enhancement", "exit", "form", "smartform",       │                    │
│   │     "workflow", "cds", "odata", "fiori"                             │                    │
│   │   • Requirement context (from Tool 02)                              │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 3: XGBOOST CLASSIFICATION                   │                    │
│   │                                                                     │                    │
│   │   ┌─────────────────────────────────────────────────────────────┐ │                    │
│   │   │  XGBoost Multi-Class (6 classes, 500 trees, max-depth=6)    │ │                    │
│   │   │                                                              │ │                    │
│   │   │  Input Feature Vector (38-dim)                               │ │                    │
│   │   │                    ↓                                         │ │                    │
│   │   │  ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐ ┌───────┐│ │                    │
│   │   │  │   R   │ │   I   │ │   C   │ │   E   │ │   F   │ │   W   ││ │                    │
│   │   │  │ 0.12  │ │ 0.08  │ │ 0.03  │ │ 0.94  │ │ 0.05  │ │ 0.02  ││ │                    │
│   │   │  └───────┘ └───────┘ └───────┘ └───────┘ └───────┘ └───────┘│ │                    │
│   │   │                                                              │ │                    │
│   │   │  Softmax: P(class) = exp(z_i) / Σ exp(z_j)                  │ │                    │
│   │   │                                                              │ │                    │
│   │   │  Predicted: Enhancement (E) with 0.94 confidence             │ │                    │
│   │   └─────────────────────────────────────────────────────────────┘ │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │                    STEP 4: EFFORT ESTIMATION                        │                    │
│   │                                                                     │                    │
│   │   Separate XGBoost Regressor:                                       │                    │
│   │   Features = [Class one-hot, Complexity score, Module,              │                    │
│   │              Keyword density, Requirement length]                   │                    │
│   │                    ↓                                                │                    │
│   │   Predicted Effort: 24.5 hours                                      │                    │
│   │   Confidence Interval: [18.2, 30.8] hours                           │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│          │                                                                                 │
│          ▼                                                                                 │
│   ┌──────────────────────────────────────────────────────────────────┐                    │
│   │   OUTPUT JSON:                                                    │                    │
│   │   {                                                               │                    │
│   │     "object": "Z_CDS_VENDOR_MASTER",                              │                    │
│   │     "ricefw_class": "E",                                          │                    │
│   │     "confidence": 0.94,                                           │                    │
│   │     "effort_hours": 24.5,                                         │                    │
│   │     "effort_interval": [18.2, 30.8],                              │                    │
│   │     "explanation": ["Z_CDS_ prefix", "CDS in description"]        │                    │
│   │   }                                                               │                    │
│   └──────────────────────────────────────────────────────────────────┘                    │
│                                                                                           │
└─────────────────────────────────────────────────────────────────────────────────────────┘

Detail

🏗️ Architecture & Integration

Where RICEFW Classifier Sits in A²AI

🧭 TOOL 03
Module Router

📱 TOOL 07
Fiori Recommender

🏷️ TOOL 02
Requirements

↓

🧩 TOOL 08
RICEFW Classifier
XGBoost + Rules

↓

TOOL 09
Clean Core Scorer

TOOL 04
Cost Forecaster

SOW Generator
Custom Dev Section

Tool 08 quantifies custom development effort and feeds Clean Core analysis.

Detail

📐 Mathematical Explanation

XGBoost Objective (Multi-Class):

Obj = Σ L(y_i, ŷ_i) + Σ Ω(f_k)

Where L is softmax cross-entropy loss:
L = - Σ y_ic log(p_ic)
p_ic = exp(f_c(x_i)) / Σ_j exp(f_j(x_i))

Tree Regularization:

Ω(f) = γT + ½λ Σ w_j²

Where T is number of leaves, w_j are leaf weights.

Gradient and Hessian for Softmax:

g_ic = p_ic - y_ic
h_ic = p_ic(1 - p_ic)

Effort Regression (MAE Loss):

L(y, ŷ) = |y - ŷ|

Gradient: sign(ŷ - y)

Detail

📊 Measured Performance

Metric	Value	Benchmark
Classification Accuracy	93.7%	11,800 labeled objects (test split)
F1 Score (Macro)	0.91	Balanced across 6 classes
Effort MAE	6.4 hours	vs. actual logged effort in SolMan
Effort MAPE	18.2%	Mean absolute percentage error
Rules Coverage	22%	% of cases handled deterministically
Active Learning Improvement	+2.3%	Accuracy gain from human feedback

Detail

📚 Training & Calibration Set

•Size: 11,800 labeled custom objects
•Source: Delivered ABAP portfolios from 42 projects
•Class Distribution: R (28%), I (15%), C (8%), E (32%), F (12%), W (5%)
•Effort Labels: Actual logged hours from SAP Solution Manager
•Validation: 5-fold stratified cross-validation by project
•Active Learning: Weekly retraining with new human-reviewed cases

Detail

🎬 End-to-End Example

Scenario: Ambiguous Object Classification

•Input: Object "Z_VENDOR_UPDATE" with description "Update vendor master data via BAPI"
•Rules Layer: No unambiguous match (doesn't start with Z_CDS_, not a transaction, etc.)
•Feature Engineering: Extracts "BAPI" keyword, "update" verb, MM module context
•XGBoost: Predicts Interface (I) with 0.78 confidence, Enhancement (E) with 0.15 confidence
•Effort Regressor: Predicts 42.3 hours (Interface multiplier applied)
•Confidence Check: 0.78 > 0.65 threshold → No human review needed

Result: Correctly classified as Interface. Fed to Tool 04 with appropriate effort multiplier.