Retail Agentic AI Handbook (1): Which 4 of Your 28 'Smart-X' Projects to Start With

Yaqin Hei··20 min read
Retail Agentic AI Handbook (1): Which 4 of Your 28 'Smart-X' Projects to Start With

This is the English edition of Part 1 in the Retail Enterprise Agentic AI Handbook — the series that walks through taking one customer-service Agent from zero to one. The meta-level methodology (how to decide what should be built as an Agent at all) is in a separate series, starting with the L0-L3 Grading Framework. 中文版:零售企业 Agentic AI 落地手册(一):你的 28 场景清单里,应该先做哪 4 个.

Opening: Your Boss Just Handed You 28 "Smart-X" Projects and Wants Them All This Year

A Monday morning all-hands. Your boss drops a list on the table —

Smart customer service, smart replenishment, smart slow-mover alerts, smart scheduling, smart store ops, smart membership, smart content, smart recommendation, smart pricing, smart stock-taking, smart hiring, smart competitor watch, smart compliance… 28 candidate projects, every name prefixed with "smart" or "Agent." Budget is approved, headcount is waiting for orders, KPIs are waiting for definitions. You don't know which are real, which are vapor, which return ROI in six months, and which will make your boss ask "how is this different from the old chatbot?"

This isn't anxiety, it's a mismatch — you've been asked to plan the entire portfolio, but you don't have a ruler for "which one first."

Over the past year I've helped a few retail groups cut this list from 28 down to 4. What got cut wasn't budget, it was distraction — 19 of the scenarios are rule-based automation or pure computer vision that don't belong in the "AI strategy" lane at all (the grading logic for that is in the L0-L3 framework); 4 are worth doing but need to wait for data plumbing; the remaining 5 are P0, and doing them right forces the data middle-layer to come alive.

This article gives you two things —

  1. A 28-scenario map, each scenario tagged P0/P1/P2/P3 with a reason
  2. The argument for starting with customer service, plus 5 decisions you must land in Week One

Five minutes in, you can judge at your next strategy meeting which projects are real P0s and which are vendor noise. Twenty minutes in, you can hand your boss "4 P0s + 5 Week-One decisions" as a written plan.

1. The Map — 6 Business Domains, 4-5 Scenarios Each

Lay the full picture on the table first: retail Agentic AI isn't just two lanes ("smart customer service" and "smart recommendation"). It's 6 business domains, 28 specific scenarios.

+------------------------------------------------------------------+
|                   Customer Journey                                |
|   Sales-Assistant Copilot  After-Sales CS  Membership            |
|   Personalized Marketing   In-Store Experience                   |
+------------------------------------------------------------------+
|                   Supply Chain                                    |
|   Smart Replenishment   Slow-Mover Alerts   New-SKU Placement    |
|   Logistics Tracking    Supplier Collaboration                   |
+------------------------------------------------------------------+
|                   Store Ops                                       |
|   Scheduling   KPI Coaching   Display Compliance                 |
|   Stock-Taking   Equipment Maintenance                           |
+------------------------------------------------------------------+
|                   People                                          |
|   Employee Training   Recruiting Filter                          |
|   Performance Feedback   Employee Care                           |
+------------------------------------------------------------------+
|                   Marketing                                       |
|   Campaign Planning   Content Gen   Competitor Watch             |
|   Data Analysis   Brand Compliance                               |
+------------------------------------------------------------------+
|                   Finance & Risk                                  |
|   Dynamic Pricing   Return-Fraud Detection                       |
|   Financial Forecasting   Compliance Review                      |
+------------------------------------------------------------------+
                      Shared Infrastructure
       Data Hub (Activated) / Tag Factory (Live) / API Gateway / AI Orchestration

Here are the full 28 scenarios across six tables. One reading rule — P0 start immediately, P1 wait for data infrastructure, P2 wait for data assets to mature, P3 push past 2027:

Customer Journey (5)

ScenarioCore ValuePriority
After-Sales CS AgentAI handles 70-80% of conversations, ~60% headcount reductionP0
Online Sales-Assistant CopilotAssistant productivity +30%, new-hire ramp -30%P0
In-Store Sales-Assistant CopilotScan-to-card product knowledge, in-store conversion liftP1
Membership AgentAuto-trigger birthday/points-expiring, member repurchase +10-15%P1
Personalized Recommendation AgentReal-time mini-app homepage personalization, CTR +20-30%P2

Supply Chain (5)

ScenarioCore ValuePriority
Smart Replenishment AgentStockout rate -20%, AI generates replenishment suggestions (human review)P0
Slow-Mover Alert AgentAuto-flag high inventory-age SKUs, turnover +10%P1
Logistics Anomaly AgentProactive in-transit delay alerts, logistics complaints -30%P1
New-SKU Placement AgentStore-profile-based allocation, first-week sell-through +15%P2
Supplier Collaboration AgentAutomated brand sales reports, coordinator productivity +50%P2

Store Ops (5)

ScenarioCore ValuePriority
Scheduling AgentTraffic-history-driven suggestions, advisory only (not auto-write)P1
Store KPI Coaching AgentDaily operational-health digest for store managersP2
Display Compliance AgentPhoto upload + AI compliance check, audit headcount -60%P2
Stock-Taking AgentRFID + AI variance detection, stock-take efficiency +70%P2
Equipment Maintenance AgentPredictive maintenance for POS etc., failure rate -20%P3

People (4)

ScenarioCore ValuePriority
Employee Training AgentAI role-play as customer, training for sales/CS new-hiresP0
Recruiting Filter AgentResume scoring + interview scheduling, hiring cycle -40%P2
Performance Feedback AgentPersonalized review reports replacing template monthliesP3
Employee Care AgentAttrition-risk early warning, compliant sentiment monitoringP3

Marketing (5)

ScenarioCore ValuePriority
Content Generation AgentProduct copy / social posts, content throughput 10×P1
Campaign Planning AgentHistorical-ROI-driven campaign drafts, planner productivity +40%P2
Competitor Watch AgentAuto-track competitor pricing/new SKUs, market response +50%P2
Data Analysis AgentNL questions, auto-generated reports (Text-to-SQL)P2
Brand Compliance AgentMarketing-asset brand-rule checking, avoid violation finesP2

Finance & Risk (4)

ScenarioCore ValuePriority
Return-Fraud Detection AgentDetect anomalous returns (serial returns / new-for-old), fraud loss -30-50%P1
Price Optimization AgentBest price within brand-authorized range, gross margin +2-3%P2
Financial Forecasting AgentRolling monthly/quarterly revenue forecast, budgeting cycle -30%P2
Compliance Review AgentBrand-authorization contract review, avoid licensing violationsP3

The 28 isn't there for you to do them all. It's there so you know you didn't miss anything. The next section compresses the whole table down to 4 P0s.

2. The "Business Value × Tech Complexity" Matrix Has Fooled 90% of Project Managers

I've seen too many AI portfolio plans use the same matrix — vertical axis business value, horizontal axis technical complexity, four quadrants. The problem with that picture isn't that it's wrong, it's that it's too abstract — it doesn't answer "why these 4 ahead of the other 24."

Real P0 filtering answers three questions —

  1. Can ROI be computed? Business value has to convert to a specific dollar/headcount-savings number — not vague phrases like "improves experience"
  2. How reusable is the infrastructure? The data/APIs/platform this Agent forces you to build out — will it become the foundation for the next 5 Agents?
  3. Can it be rolled back? If it underperforms in production, what's the cost of pulling it?

Run the 28 through these 3 questions —

                +--------------------------------------+
       High     |  P0 (Start Now)    P1 (Next Wave)    |
                |  * After-Sales CS  * Logistics       |
   Business     |  * Sales Copilot   * Return Fraud    |
   Value        |  * Employee Train  * Slow-Mover      |
                |  * Replenishment   * Content Gen     |
                +--------------------------------------+
                |  P2 (Third Wave)   P3 (Future)       |
       Low      |  * KPI Coaching    * Display Comply  |
                |  * Recommendation  * Forecasting     |
                |  * Campaign Plan   * Equipment Maint |
                |  * Membership      * Recruiting      |
                +--------------------------------------+
                     Low                     High
                       Tech Complexity / Dependencies

The 4 P0s share the same profile — ROI computable, infrastructure reused, rollback possible, validated within 6 months. The other 24 either don't pencil out, or are waiting on infrastructure, or can't be rolled back if they go wrong.

One-sentence P0 test: If this project doesn't ship, can your boss say six months from now "good thing we didn't bet the farm on it"? If the answer is no, it's not P0.

3. Start With Customer Service — Not Because It's Easiest, Because It Forces the Whole Foundation

People push back — "why not start with something sexier, like personalization or dynamic pricing?"

Put the verdict on the table first: customer service Agent looks like one project, but it's actually the foundation-installation plan for your entire Agentic AI stack.

Three reasons, each tied to a concrete observable —

Reason 1: ROI is computable, not fuzzy

Customer service is one of the few retail areas where you can multiply monthly-salary × headcount to compute AI replacement value directly

  • CS headcount is a quantifiable cost line (monthly salary × people × 12)
  • AI handling 70% of conversations ≈ 60% headcount reduction
  • A retailer with thousands of daily tickets saves tens of millions of RMB per year in headcount

Compare that with "personalization +20% CTR" — from click to checkout there are 6-8 conversion steps, each can absorb blame, and when your boss asks "how much profit did that 20% actually produce," no one has an answer.

Detection signal: If an Agent project's ROI calculation needs more than 3 inference steps (X → Y → Z → revenue), it probably isn't P0.

Reason 2: Infrastructure reuse is highest

The infrastructure that customer-service Agent construction forces you to build is exactly the foundation every later Agent needs —

Infrastructure built during CS Agent construction:
+-------------------------------------------+
| Data Hub APIs (orders / logistics / SKU)  |
| Tag Factory (customer profile / tier)     |
| 3-layer knowledge base                    |
| API gateway (unified call layer)          |
| AI orchestration workflows                |
+-------------------------------------------+
           |
           v
Directly supports the next 25 Agent scenarios

You look like you're building one customer-service Agent. You're actually forcing the data hub and tag factory to come alive. Build the foundation first, and downstream P1-P2 Agent delivery cycles shrink by 50%+.

Reason 3: Risk is the most controllable — and other candidates can't match this

CS Agent has 3 safety nets the others don't —

  • Human handoff: Anything AI is uncertain about can route to a human at any moment, so no irreversible damage
  • Gradual rollout: Start at 10% traffic and ramp
  • Measurable outcomes: First-call resolution, CSAT, escalation rate — every metric has a clear formula

A replenishment Agent gone wrong → inventory pile-up or stockout, incident within 24 hours. A pricing Agent gone wrong → brand authorization dispute, incident within a week. Customer service has the lowest error cost of the three P0 candidates.

One project solves three problems at once — cost reduction, infrastructure build-out, team capability validation. That's why you start with customer service.

4. Three Things to Do Before You Start — Skip Any and You Pay in Three Months

Before you touch the keyboard, do a diagnostic of the existing systems. Skip any one of these and you'll be back patching three months later.

System/ModuleWhat to verifyWhy it matters for AgentAction
WeCom CSIn use? API access open?The Agent's entry point; API integration is mandatoryConfirm API access, check rate limits
CS ticket platformIs the existing bot still running? KPIs tracked?Old bot is a negative example — typical "deployed and abandoned"; old conversation logs are reusableDon't depend on the old bot; reuse its conversation logs
Ticketing systemIndependent vendor? Integrated with CS platform?Critical path for complaint escalation, human handoffMap ticketing APIs, define "escalate to human" triggers
CS knowledge baseWhat's the coverage of existing docs?Agent fuel. No knowledge base, Agent has only model parametric knowledge — accuracy unbounded downwardTop priority (Part 2 covers this)
Historical conversation dataWhich system? Volume?The most valuable raw asset — for knowledge extraction, training data, benchmark designApply for export access immediately, assess volume + quality
Human CS baselineFirst-call resolution / handle time / CSAT baseline?Without a baseline you can't tell if AI is better or worse than the human teamBuild the baseline in parallel with the knowledge base (see below)

The Step Most Teams Skip — Establishing the Human Baseline

One thing most retail organizations don't do — establish the human-CS baseline before AI goes live, and freeze it.

Most retail CS systems don't have systematic evaluation data: no first-call resolution rate, no average handle time, no resolution rate by question type. Once AI ships, the conversation distribution changes, you can never recover a "pure human" baseline — and the question "is AI better than humans" becomes unanswerable forever.

Three months in your boss asks "did that 2M RMB investment have positive ROI" — you can only point at proxy metrics like "per-rep throughput +X%" and there's no real control experiment.

How to establish the baseline (doable in one week) —

  1. Pull the last 3 months of data from the CS platform
  2. Random-sample 500 conversations, manually label: question type, handle time, first-call resolution, CSAT, escalation
  3. Compute core human-CS metrics as your baseline
  4. Output the TOP 50 high-frequency question list at the same time — this is your knowledge-base build priority

Detection signal: if the team proposes "let's ship AI first and build the baseline later" — full stop. This is the most common booby trap from delivery vendors; six months later ROI is unprovable and you'll have to invest again in regression testing.

5. Five Decisions Week One Must Land — Three About People, Two About Money

If you've decided to launch the CS Agent project, Week One's critical work isn't technical — it's organizational decisions. Technical work can start Week Two and still be on time. But these 5 decisions, if not made in Week One, become project blockers three months later —

Decision 1: Position headcount reduction — target or outcome?

Whether headcount reduction is a goal or a natural outcome doesn't affect the number, it affects the speed of knowledge transfer.

If your messaging is "AI replaces humans" — the customer-service reps who know your customers best will leave proactively before AI ships, and your knowledge base loses its most valuable input source. I've seen a retailer lose 6 senior CS reps to competitors within 3 months, taking know-how that was worth more than the entire AI project investment.

Recommended framing: "AI augments efficiency"; headcount is an evaluated outcome, not a project goal. Lock this messaging in Week One — internal, to vendors, to brand partners.

Decision 2: AI's internal positioning — give the CS team a story where they aren't replaced

  • For the CS team: position AI as a productivity tool — AI handles simple/repetitive, humans focus on complex/high-value
  • For brand partners: pre-communicate brand-image risk management
  • For customers: decide on labeling "AI reply" — transparency vs. seamless

This isn't PR spin, it's how the Agent actually works. A CS Agent that genuinely hits 65% first-call resolution still routes the remaining 30% complex tickets to humans — you need those 30% humans to be motivated.

Decision 3: A small exploratory budget — don't approve 2M RMB upfront

The real first-stage cost (1-2 months) of an AI project —

  • AI orchestration platform (cloud server): 1,500-2,000 RMB/month
  • LLM inference (validation-stage low volume): 500-1,000 RMB/month
  • Vector retrieval / Rerank: 500-700 RMB/month

Total: single-digit thousands RMB/month.

The purpose of this budget isn't "ship a product," it's "validate feasibility + build infrastructure."

Detection signal: If a vendor quotes you 2M RMB for a CS Agent project — ask "how much do you need for the first 8 weeks?" If they can't answer, or it's still 300k+ RMB, alarm bells.

Decision 4: Knowledge-base owner — must be someone who knows the business, not engineering

The biggest bottleneck for knowledge-base construction isn't technology, it's content quality — you need a senior CS rep who knows the business to own content (Part 2 covers this in depth).

The Week-One decision isn't about technical architecture, it's about this person's incentives

  • Does this role get reassigned from "CS rep" to "knowledge-base operations"?
  • How do level/comp align?
  • If they do well, can they get promoted to management in 3 months?

Without that incentive layer you won't get a senior CS rep's full know-how — they'll hold back enough to preserve their irreplaceability.

Decision 5: IT resources — approval cycles longer than the build itself

The Agent needs to integrate with orders, logistics, ticketing — all owned by IT. API access approval + sandbox environment + interface validation — this path is often slower than the engineering build itself.

I saw one project where engineering finished in 6 weeks, then waited 11 weeks for IT approval before go-live.

Start the IT conversation Week One — don't wait until the knowledge base is done. Get the interface doc list, sandbox request, and key API rate-limit confirmation — these three must land in Week One.

Week One's 5 decisions, in order: 3 about people (headcount strategy, internal messaging, knowledge-base owner), 2 about money/resources (budget, IT support). Technical decisions aren't the Week-One agenda.

6. The 28-Scenario Dependency Map — You're Not Building an Agent, You're Building Infrastructure

One last map for the engineering team. The 28 Agents don't launch in parallel — there are strict dependencies between them:

[Data Foundation] (foundation for every Agent)
  Data hub activated (read-only APIs)
    | unlocks
    +-- Orders/logistics API --> After-Sales CS Agent (logistics queries)
    +-- Inventory API --> Replenishment, New-SKU Placement
    +-- Sales data API --> Slow-Mover Alerts, Forecasting, KPI Coaching

  Tag factory dynamic (read-only --> real-time write-back)
    | unlocks
    +-- Read-only --> Sales Copilot (personalization), Membership Agent
    +-- Dynamic write-back --> Personalization Agent, precision marketing

[Knowledge Base] (foundation for CS-related Agents)
  3-layer knowledge base (200 --> 800 --> 2000+ Q&A)
    | unlocks
    +-- Layer 1 --> Auto-handling of return/exchange policy queries
    +-- Layer 2 --> Auto-handling of product queries, in-store Sales Copilot
    +-- Layer 3 --> Complaint handling, employee training scenarios

[Vision] (foundation for image-driven Agents)
  Multimodal model integration
    | unlocks
    +-- Quality-complaint image analysis (After-Sales CS reinforcement)
    +-- Display Compliance Agent
    +-- Brand Compliance Agent

The 4 P0 Agents (after-sales CS, sales copilot, replenishment, employee training) — their value splits in two —

  • Half is in the Agent itself (cost reduction + experience uplift)
  • Half is in forcing the data hub and tag factory to come alive

Once that infrastructure is activated, P1-P2 Agent delivery cycles shrink by 50%+. When you negotiate with a vendor over "building a CS Agent," you're actually paying the up-front cost for the entire retail AI stack.

Where this leaves you

If you want to use the "28-scenario grading + Week-One 5 decisions" directly in your next strategy meeting — without re-reading this article every time — I packaged a PDF kit for readers who got this far. Send me the keyword "RETAIL 28" and I'll send the pack:

  1. 28-scenario priority grading table (one-page A3 print — see your whole portfolio P0/P1/P2/P3 in 30 seconds)
  2. Week-One 5-decision checklist (card version — decision, risk, owner, output — drop it in the team chat and everyone gets it)
  3. CS Agent vs. other candidate Agents — 6-dimension comparison sheet (built for vendor-proposal reviews)

(Channels in the footer — X or email both work.)

Next: How to Build the Knowledge Base, Pick the Model, and Where the Money Actually Goes

Part 2 tackles the three most painful technical-architecture questions —

  • 3-layer knowledge base — how to build? Why "dumping docs into a vector database" wastes you 3 months?
  • Domestic vs. overseas models — in CS scenarios, which 4 specific capabilities actually differ?
  • The four cost buckets — inference, headcount, infrastructure, integration — how much each, and which one is most underestimated?

Series TOC:

Subscribe for updates

Get the latest AI engineering posts delivered to your inbox.

评论