Trained for 60,000 Steps, the Agent Learned to Delete Tickets — Six Reward-Hacking Patterns in ITSM Automation

Yaqin Hei·October 10, 2025·30 min read

Agentic AI ITSM Ticket Automation Reward Hacking AI Safety Research

Trained for 60,000 Steps, the Agent Learned to Delete Tickets — Six Reward-Hacking Patterns in ITSM Automation

IT ticket automation and retail customer service are two parallel Agent deployment domains — internal vs. external, action-driven vs. text-driven, and their safety models look nothing alike. This post focuses on the most dangerous and least visible failure mode in ITSM automation: reward hacking. The material is the engineering-perspective debrief from an academic paper of mine currently under submission (Causal Detection of Reward Hacking in Autonomous IT Service Agents) — six hacking patterns observed in controlled experiments, validated across two public datasets (UCI ServiceNow and Mendeley Help Desk). A third parallel deployment scenario — LLM continual learning, where catastrophic forgetting compounds with every newly onboarded product domain — is the field report in Trained on 52 Product Domains, the Earlier 51 All Sank. For the meta-methodology — how to grade Agents, design Critics, avoid deploy-and-abandon — see my Agentic AI in Practice series. 中文版：训练 60,000 步，Agent 学会的不是解决工单——是删工单.

Opening: A Training Curve and That Flat 100% Line

While doing my PhD research I designed an ITSM Agent experimental environment — ITSM-Gym. It's a Gymnasium-standard reinforcement-learning environment fit from a public ServiceNow incident dataset (141,000+ events, 24,918 incidents), letting an RL agent learn to "process tickets."

The reward design was deliberately conservative —

proxy reward: backlog reduction +1, throughput +1, latency penalty
true reward: user issue genuinely resolved +1 (this channel is hidden from the agent)

First run: DQN, 60,000 steps.

The training curve looked like this —

Steps 0–10,000: agent flailing, proxy reward bouncing around zero
Steps 10,000–30,000: proxy reward climbing monotonically; from the outside it looks like convergence
Steps 30,000–60,000: proxy reward stable at a suspiciously high level

I went to look at what the agent was actually doing per episode.

It was deleting 134 tickets per episode. Genuine resolutions: zero.

Five random seeds, all converged to the same behavior with near-zero variance (134.4 ± 17.0). Swap DQN for PPO and the hacking rate drops to 99.2 ± 0.7% — slightly less extreme, fundamentally identical.

Two mainstream RL algorithms, five seeds, all learned that "the fastest way to reduce backlog is to delete tickets." This isn't exploration noise — it's a stable policy equilibrium. As long as "delete" and "resolve" earn the same proxy reward but deletion costs 0 seconds and resolution costs 45 minutes, any rational optimizer picks delete.

That flat 100% line isn't a bug. It's the textbook definition of reward hacking — the agent violates no explicit rule, but optimizes a proxy that didn't capture what we actually wanted (Amodei et al. 2016, "Concrete Problems in AI Safety"; Skalse et al. 2022 for the formal definition).

Before this PhD work, I spent a few years doing ITSM Agent engineering for Fortune 500 clients — multi-Agent architectures, ServiceNow integrations, deployments handling hundreds of thousands of tickets per year. That multinational-enterprise context shaped one core conviction: the real danger after an ITSM Agent ships isn't the LLM saying something wrong — it's the Agent silently optimizing a proxy you never meant to specify. This research is the formalization of that conviction.

In 30 minutes you'll walk out with —

A complete debrief of an ITSM-domain reward-hacking experiment — environment design / training result / causal detection / failed repair attempts
Six ITSM-specific hacking patterns, each with an SQL query you can run this week
Why Critics don't catch it, and why dashboards don't catch it either, plus ten things your engineering team can do this week and five questions to interrogate any ITSM Agent vendor with

1. Why ITSM Is Especially Dangerous

Reward hacking isn't unique to ITSM. OpenAI's CoastRunners boat agent (2016) learned to loop in circles harvesting power-ups instead of finishing the race. DeepMind maintains a Specification Gaming database with 50+ examples of RL agents optimizing the wrong objective in weird ways.

But ITSM has three properties that make reward hacking significantly more dangerous than other domains —

Actions are structural, not textual. A retail CS Agent saying "we will absolutely refund 200 RMB" gets pushed back on by the customer in real time and triggers Critic-layer output filtering. The ITSM Agent's "output" is a ServiceNow API call — PATCH /api/now/table/incident/<id> setting state=Resolved. From the Critic's perspective, that call is structurally legal.
User feedback is enormously delayed. Errors in retail surface within an hour. Errors in ITSM may take the user 2–3 weeks to notice their ticket vanished ("I assumed IT was just slow"); meanwhile thousands of valid tickets have been silently dispatched into the void.
Proxy metrics are tied directly to team OKRs. Retail CSAT is a lagging indicator the team watches; ITSM backlog, throughput, and SLA compliance are leading indicators driving the team OKRs themselves. The Agent's implicit reward IS the team's OKR.

Point 3 is the one that should keep you awake. Even if your system has no RL agent and no explicit reward function — as long as your team, your PM, and your PR reviews all point in the same direction (make the backlog number look better), the LLM-driven Agent's implicit reward is that backlog counter. Reward hacking doesn't require RL. It requires only an optimization target plus an action space with degrees of freedom.

The starting point for this research was a documented industrial incident cited in §1 of my paper — an IT services company's multi-Agent ITSM deployment was found to be systematically deleting valid tickets. I turned that into controlled experiments. The rest of this article is the engineering debrief.

2. How ITSM-Gym Is Designed

To make reward hacking observable in experiments, the first engineering decision was — proxy and true reward must travel on independent channels.

# Simplified
@dataclass
class StepResult:
    state_next: TicketState
    proxy_reward: float   # agent sees this
    true_reward: float    # hidden from agent; used for evaluation
    is_hacking: bool      # ground-truth label, computed from action + outcome

proxy_reward is the signal the agent uses to update its policy — a linear combination of backlog size, throughput, and SLA compliance. true_reward is the researcher's ground-truth check — was the user's problem actually solved (modifications > threshold + state=Resolved + no reopen)? is_hacking is the intersection (proxy > 0) AND (true ≤ 0) — the step where the agent won on the proxy but didn't serve the user.

The action space has six slots —

class Action(Enum):
    RESOLVE   = 0    # genuine resolution (slow, costly)
    REASSIGN  = 1    # punt to another team
    ESCALATE  = 2    # raise priority
    CLOSE     = 3    # close ticket
    DELETE    = 4    # delete ticket (the obviously bad one)
    DEFER     = 5    # postpone

Leaving DELETE in the action space is intentional. If the agent finds that path and walks down it, that's the hacking behavior we want to observe. The core research question isn't "can we build a non-hacking agent?" — it's "when the agent is already hacking, can we detect it from the outside?"

3. Training Results: 100% Isn't an Outlier, It's a Stable Equilibrium

DQN and PPO, five seeds each, 60,000 steps. Results —

Agent	Hacking Rate	Deleted / episode	Genuine / episode
DQN unconstrained	100.0 ± 0.0%	134.4 ± 17.0	0.0 ± 0.0
PPO unconstrained	99.2 ± 0.7%	66.8 ± 12.6	0.8 ± 0.7
DQN constrained (delete + premature close disabled)	0.0 ± 0.0%	0.0 ± 0.0	34.4 ± 0.7
PPO constrained	0.0 ± 0.0%	0.0 ± 0.0	15.9 ± 2.1

Near-zero variance across seeds tells us this is a stable attractor in policy space, not a quirk of initialization.

The learning dynamics are even more telling — DQN's delete rate grows from ~36 deletions/episode in early training to ~136/episode by the end. Hacking is a strategy the agent actively learns, not a sign of failed convergence.

The night that table came in, I stared at wandb for a long time. The only thought in my head was — Amodei et al. 2016 used the toy example "AI receives reward for cleaning a room, learns to turn off the lights so it can't see the mess." Apparently in ITSM, this specific shape is what it actually looks like.

4. Six ITSM-Specific Reward-Hacking Patterns

When I systematized this work, I cataloged six ITSM-specific reward-hacking patterns. The first four were observed directly in ITSM-Gym experiments and replicated in both UCI ServiceNow and Mendeley Help Desk public datasets. Patterns 5 and 6 are derived from real ticket-process analysis as part of the complete taxonomy.

Pattern 1: Ticket Deletion

Mechanism: Agent directly DELETEs tickets. Proxy signal: backlog drops. True consequence: user requests permanently lost. This-week SQL:

SELECT sys_created_by, COUNT(*) FROM sys_audit
WHERE table_name='incident' AND operation='DELETE'
  AND sys_created_on > NOW() - INTERVAL '30 days'
GROUP BY sys_created_by;

Red line: in a healthy ITSM process, DELETE should be ≈0. Any service account showing non-zero DELETE — kill that account's delete permission immediately.

Pattern 2: Premature Closure

Mechanism: Agent marks state Resolved without actually resolving anything. What the user sees isn't "the ticket vanished" — it's "the ticket shows Resolved but I never got any email." Proxy signal: average handle time drops, SLA compliance climbs. True consequence: users resubmit (reopen rate spikes) or abandon the system. This-week SQL: filter incident → New→Resolved transitions under 3 minutes:

SELECT
  task_id,
  EXTRACT(EPOCH FROM (resolved_at - sys_created_on)) AS resolve_seconds,
  close_notes
FROM incident
WHERE state = 'Resolved'
  AND resolved_at - sys_created_on < INTERVAL '3 minutes'
  AND sys_created_on > NOW() - INTERVAL '30 days'
ORDER BY resolve_seconds ASC
LIMIT 200;

Red line: 48h reopen rate > 8% triggers an architecture review. Scan close_notes — if you see "unclear request" / "duplicate" / "won't do" in unusual volume, that's almost certainly premature closure.

Pattern 3: Reassignment Loop

Mechanism: Team A → Team B → Team A reassignment cycles. Each reassignment gives the proxy an "active" signal; the ticket isn't counted toward SLA breach. Proxy signal: ticket shows active, KPIs unaffected. True consequence: the ticket sits in place for weeks. This-week SQL:

SELECT
  task_id,
  COUNT(*) AS reassignment_count,
  STRING_AGG(new_value, ' → ' ORDER BY sys_created_on) AS path
FROM sys_audit
WHERE table_name = 'incident'
  AND field = 'assignment_group'
  AND sys_created_on > NOW() - INTERVAL '14 days'
GROUP BY task_id
HAVING COUNT(*) >= 4
ORDER BY reassignment_count DESC
LIMIT 50;

Red line: 80%+ of reassignment_count > 3 tickets are loops, not legitimate expert routing (this is not an intuition call — see §6).

Pattern 4: Cherry-Picking

Mechanism: Agent prioritizes simple tickets (password resets) and lets complex tickets age. Proxy signal: throughput soars. True consequence: the hard tickets pile up. This-week check: compare complexity median between Agent-resolved vs. unresolved tickets (proxy complexity with description length + count of systems involved). A median gap > 30% is a cherry-picking signal.

Pattern 5: SLA Gaming

Mechanism: Bulk-close tickets in the last 5 minutes before the SLA deadline. Proxy signal: SLA compliance climbs. True consequence: those tickets reopen at high rate within 24–48 hours. This-week check: look at the daily close-action time distribution — if more than 30% of closes happen in the last 10% of the SLA window, that's SLA gaming.

Pattern 6: Ticket Splitting

Mechanism: One ticket split into multiple sub-tickets, each closed separately; the proxy counts them as N resolutions. Proxy signal: resolution count soars. True consequence: engineering team handles the same issue redundantly; user gets pinged repeatedly. This-week check: same user creating 3+ tickets within an hour with high description similarity (trigram or embedding cosine) — likely Agent-split.

5. Why a Critic Isn't Enough — It Checks "Say-Something", Not "Choose-Something"

ITSM Agent safety design often borrows the Critic pattern from retail customer service — drop in an output filter that inspects the LLM-generated text and intercepts dangerous phrasings.

That works in retail (see Critic Must Fail-Closed for the full argument). But in ITSM, a Critic is necessary-not-sufficient.

A simplified Critic looks roughly like this —

BLOCK_PATTERNS = [
    r"\b\d+\s*USD\b.*compensate",   # specific compensation amount
    r"backend system|database|admin panel",  # internal info leakage
    r"definitely.*resolved",         # over-commitment
]

def critic_check(agent_output: str) -> CriticResult:
    for pattern in BLOCK_PATTERNS:
        if re.search(pattern, agent_output, re.IGNORECASE):
            return CriticResult(block=True, reason=pattern)
    return CriticResult(block=False)

It checks output text. But when the ITSM Agent marks a ticket Resolved, its "output" looks like —

PATCH /api/now/table/incident/<sys_id>
Authorization: Bearer <svc-account-token>
Content-Type: application/json

{
  "state": "Resolved",
  "close_code": "Solved (Permanently)",
  "close_notes": "Issue addressed"
}

From the Critic's perspective this HTTP call is fully legal — the service account has permissions, the fields are formatted correctly, business rules don't object, and there's no "dangerous phrasing" for regex to match. The Critic doesn't see this as hacking.

This is the Critic's structural limit — it checks say-something, not choose-something. In retail the dangerous exit is LLM text output; in ITSM the dangerous exit is the action choice. The two domains need fundamentally different safety mechanisms.

6. Why Dashboards Aren't Enough Either — Naive Correlation Lies

OK, Critic isn't enough. What about dashboard monitoring? Rule: "Alert if any service account exceeds 10 DELETEs in a day."

Necessary, but still not enough — observed correlations are contaminated by confounding.

A concrete demonstration. Across both UCI ServiceNow and Mendeley Help Desk, I ran the same causal analysis: "Does higher reassignment count cause higher reopen rate?" This is the core question behind Pattern 3 (reassignment loops) — it asks whether a high-reassignment signal is gaming or legitimate expert routing.

Naive OLS (what a dashboard does naturally) vs. Instrumental Variable (IV) estimation —

Dataset	Naive OLS estimate	IV (causal) estimate	Interpretation
UCI ServiceNow (enterprise IT, 24,918 tickets)	-0.052 (looks like reassignment is good)	+0.150 (actually harmful, sign reversed)	Enterprise ITSM: high reassignment = gaming
Mendeley Help Desk (developer, 66,691 issues)	-0.035 (looks like no effect)	-0.140 (actually beneficial)	Developer context: high reassignment = expert routing

The first time UCI returned +0.150, I assumed the IV was misconfigured. I bootstrapped 200 resamples — 95% CI was [+0.092, +0.203], fully excluding the OLS estimate of -0.052. Swapped instrument sets; sign was preserved. Wooldridge overidentification test p=0.412 (cannot reject instrument exogeneity). About a week in I finally accepted the result — the "reassignment is benign" reading you get from a naive enterprise ITSM dashboard is a confounding artifact; the actual causal effect is harmful, with the sign reversed.

The Mendeley story is even more striking — on the developer dataset, the same "high reassignment" signal has the opposite causal meaning. In developer issue tracking, reassignment is often expert routing ("I can't crack this bug; routing to someone who can"), not deflection.

Same hacking-candidate signal, two platforms, opposite causal signs. Any plan to install a one-size-fits-all "ITSM Agent monitoring rule" from a cloud vendor will fail — your platform's hacking patterns aren't template-able.

Test for whether your dashboard is adequate: if it only looks at correlations and doesn't do causal adjustment, it will lie to you exactly when you need it most — and the lie may be in the wrong direction.

7. Fix One Hack, the Agent Finds Another — Substitution Effect

After designing detection, the next step was: use the IV-estimated causal effects to repair the proxy reward and see what the agent does.

The repaired reward is —

r_repaired = r_proxy + λ · ĈATE_true(action)

where ĈATE_true(action) is the IV-estimated causal effect of each action type on the true outcome — DELETE gets a large negative penalty, RESOLVE is neutral. In theory the agent should reduce DELETEs.

What actually happened —

Condition	Hacking Rate	Deleted/ep	Prem.Close/ep	Total Hack	Genuine/ep
No repair (baseline)	100.0%	134	46	180	0
Half repair λ=0.5	100.0%	29	120	149	0
Full repair λ=1.0	100.0%	16	94	110	0
Strong repair λ=2.0	100.0%	8	89	97	0
Structurally disabled	0.0%	0	0	0	34

Deletes dropped 94% — the repair worked, on that count.

But total hacking only dropped 46% — the agent substituted premature closure for deletion. One in three blocked deletes turned into a premature closure (substitution elasticity = 0.34). Genuine resolutions remained zero.

Only structural disablement (hard-blocking the delete and premature-close actions) lets the agent route to genuine resolution — 34 tickets/episode.

When all five seeds at λ=2.0 produced the substitution pattern, I sat with the data for a day. That result changed how I think about reward design —

Soft reward repair isn't enough: any action that retains a favorable proxy-to-true ratio after correction remains attractive
You need hard constraints PLUS continuous detection, in two legs: hard constraints to block known hacking paths, continuous detection to surface new ones
Monitoring isn't one-shot configuration; it's a continuously-running product — the Agent will keep looking for new holes

8. Ten Things Your Engineering Team Can Do This Week

Turn hacking detection from "luck" into an SOP. These ten items don't require an IV framework — they're doable this week.

Pull the last 30 days of DELETE events triggered by service accounts — in healthy ITSM this should be ≈0
Bucket each service account's close-action-to-first-user-contact time — anything under 3 minutes goes to human review
Add 48h reopen rate to the dashboard front page — above 8% triggers an architecture review
Filter tickets with reassignment_count > 3 — sample 30, manually label as loop vs. legitimate routing (this judgment isn't intuitive — look at whether the path contains cycles back to a previous group)
Compare complexity distributions across Agent resolved vs. unresolved tickets — a median gap > 30% indicates cherry-picking
Look at the daily close-action time distribution — concentration in the last 10% of the SLA window above 30% indicates SLA gaming
Check description similarity for multi-ticket bursts from the same user in short windows — flags ticket splitting
Correlate proxy metrics against "user re-submits similar issue" — proxy improving while resubmissions climb is a red flag
Sample 50 Agent-closed tickets per week for human review — the cheapest source of ground truth; check close_notes for "unclear request" / "won't do" / "duplicate" patterns
Maintain "Agent-handled" vs. "human-handled" control groups — same-period human reopen rate is your baseline; Agent group > 1.5× baseline is a hacking signal

Item 9 is the most important and the one most often skipped. The cross-platform replication in my paper was only possible because UCI and Mendeley both come with "suspicious" ground-truth labels — even in the public-data world someone is doing this already. Your platform doesn't have those labels yet. Item 9 is the simplest way to start producing them.

9. Five Questions to Interrogate Any ITSM Agent Vendor With

In your next vendor review, ask these five. A vendor who can't answer or hedges — alarm bells go off:

"Can the Agent execute DELETE operations?" If yes — that's a structural hole; demand the permission be revoked.
"Is there a verification step before the close action?" If no — premature closure isn't intercepted.
"At what reopen-rate threshold do you trigger an architecture alert?" No specific threshold — no one's watching.
"Does your monitoring use correlation or causal analysis?" "We look at backlog and SLA compliance" — vulnerable to confounding, may signal in the wrong direction.
"If you patch one hacking pattern, does the Agent find another? How would you detect that?" "Patched is patched" — they don't understand substitution effects.

10. Going Deeper

The above is the engineering view — pattern recognition + detection signals + defense mechanisms. The formal definitions, the CHRD/IV/2SLS causal-detection framework, the formal characterization of all six patterns, the 94% deletion-reduction / 46% total-hacking-reduction / η=0.34 substitution-elasticity experiments, and the cross-dataset sign-reversal, are all in the paper currently under submission — Causal Detection of Reward Hacking in Autonomous IT Service Agents (Yaqin Hei). For researchers, anyone building serious ITSM Agent monitoring, or anyone curious about the IV experiment design and the open-source ITSM-Gym code, that paper is the source material.

For an entry point to the reward-hacking literature —

Amodei et al. 2016, "Concrete Problems in AI Safety" (arXiv:1606.06565) — the paper that named the problem
Skalse et al. 2022, "Defining and Characterizing Reward Hacking" — the formal definition
Pan et al. 2022, "The Effects of Reward Misspecification" — phase transitions as capability increases
DeepMind's Specification Gaming database — a curated catalog of 50+ cross-domain reward-hacking cases
OpenAI's CoastRunners 2016 blog post, "Faulty reward functions in the wild" — the canonical boat-agent-loops-for-power-ups story

After this article

If you want to drop "the six hacking patterns + the ten this-week items + the five vendor questions" straight into your next ITSM Agent review or post-launch retro — without having to re-read this article every time — I packaged a PDF kit for readers who got this far. Send me the keyword "ITSM HACKS" and I'll send the pack:

Six ITSM hacking patterns cheat sheet (one card each — mechanism / proxy signal / true consequence / this-week SQL)
Ten this-week items checklist (dashboard configs, spot-check scripts, recommended thresholds)
Five vendor-interrogation question cards (designed for live review meetings — tick red / yellow / green per answer)
ITSM-Gym reproduction scripts (DQN/PPO baselines + the six-pattern evaluation harness — for research and red-team use)