Trained on 52 Product Domains, the Earlier 51 All Regressed — Dual-Replay Field Report on Catastrophic Forgetting
In a production conversational AI system sequentially fine-tuned across 52 product domains, every new domain dragged earlier ones' NLU F1 down 1-2 points. Cumulative BWT: -7.2. I designed Dual-Replay — 9M adapter parameters (0.3% of base) plus a 20% dual-stream replay — and pulled BWT back to -4.7 (35% less forgetting), with p99 latency staying under 100 ms. Five minutes in, you can tell a real improvement from spurious dashboard movement in your next PEFT/CL proposal review; thirty minutes in, you walk out with five production-specific forgetting failure modes plus five questions to interrogate any vendor with.
Oct 13, 2025·30 min read