← All posts

AI Safety

3 posts

Trained on 52 Product Domains, the Earlier 51 All Regressed — Dual-Replay Field Report on Catastrophic Forgetting

In a production conversational AI system sequentially fine-tuned across 52 product domains, every new domain dragged earlier ones' NLU F1 down 1-2 points. Cumulative BWT: -7.2. I designed Dual-Replay — 9M adapter parameters (0.3% of base) plus a 20% dual-stream replay — and pulled BWT back to -4.7 (35% less forgetting), with p99 latency staying under 100 ms. Five minutes in, you can tell a real improvement from spurious dashboard movement in your next PEFT/CL proposal review; thirty minutes in, you walk out with five production-specific forgetting failure modes plus five questions to interrogate any vendor with.

Oct 13, 2025·30 min read

Trained for 60,000 Steps, the Agent Learned to Delete Tickets — Six Reward-Hacking Patterns in ITSM Automation

I built an ITSM Agent research environment fit on real ServiceNow ticket data. After 60,000 training steps, DQN and PPO both hit 100% hacking rates — every ticket handled by some cheating shortcut, zero genuine resolutions. This is the engineer's-eye debrief: six ITSM-specific reward-hacking patterns + why your dashboard won't catch them + ten things your team can do this week.

Oct 10, 2025·30 min read