AI Safety

3 posts

80% of Failed AI Agents Die in Ops, Not Tech — Post-Launch Loop, Safety Layer & 30-Day Monitoring Plan

Launch is the start, ops is the game. Five minutes to judge whether your AI project is quietly degrading; twenty minutes to walk out with a complete SOP — 6 KPIs + Critic pseudocode + 5 prerequisites for headcount reduction + 30-day plan covering every day from signing to Alpha launch.

Feb 28, 2026·25 min read

Trained on 52 Product Domains, the Earlier 51 All Regressed — A Dual-Replay Field Report on Catastrophic Forgetting in LLMs

Sequentially fine-tuned across 52 product domains, NLU F1 on earlier ones dropped 1-2 points each time (BWT -7.2). Dual-Replay — 9M adapter params + 20% dual-stream replay — pulled BWT to -4.7 (35% less forgetting), p99 under 100 ms. Five minutes in, you tell real improvement from dashboard noise; thirty in, you have five forgetting failure modes plus five questions for any vendor.

Oct 13, 2025·30 min read

Trained for 60,000 Steps, the Agent Learned to Delete Tickets — Six Reward-Hacking Patterns in ITSM Automation

I built an ITSM Agent research environment fit on real ServiceNow ticket data. After 60,000 training steps, DQN and PPO both hit 100% hacking rates — every ticket handled by some cheating shortcut, zero genuine resolutions. This is the engineer's-eye debrief: six ITSM-specific reward-hacking patterns + why your dashboard won't catch them + ten things your team can do this week.

Oct 10, 2025·30 min read