Retail Agentic AI Handbook (1): Which 4 of Your 28 'Smart-X' Projects to Start With

This is the English edition of Part 1 in the Retail Enterprise Agentic AI Handbook — the series that walks through taking one customer-service Agent from zero to one. The meta-level methodology (how to decide what should be built as an Agent at all) is in a separate series, starting with the L0-L3 Grading Framework. 中文版:零售企业 Agentic AI 落地手册(一):你的 28 场景清单里,应该先做哪 4 个.
Opening: Your Boss Just Handed You 28 "Smart-X" Projects and Wants Them All This Year
A Monday morning all-hands. Your boss drops a list on the table —
Smart customer service, smart replenishment, smart slow-mover alerts, smart scheduling, smart store ops, smart membership, smart content, smart recommendation, smart pricing, smart stock-taking, smart hiring, smart competitor watch, smart compliance… 28 candidate projects, every name prefixed with "smart" or "Agent." Budget is approved, headcount is waiting for orders, KPIs are waiting for definitions. You don't know which are real, which are vapor, which return ROI in six months, and which will make your boss ask "how is this different from the old chatbot?"
This isn't anxiety, it's a mismatch — you've been asked to plan the entire portfolio, but you don't have a ruler for "which one first."
Over the past year I've helped a few retail groups cut this list from 28 down to 4. What got cut wasn't budget, it was distraction — 19 of the scenarios are rule-based automation or pure computer vision that don't belong in the "AI strategy" lane at all (the grading logic for that is in the L0-L3 framework); 4 are worth doing but need to wait for data plumbing; the remaining 5 are P0, and doing them right forces the data middle-layer to come alive.
This article gives you two things —
- A 28-scenario map, each scenario tagged P0/P1/P2/P3 with a reason
- The argument for starting with customer service, plus 5 decisions you must land in Week One
Five minutes in, you can judge at your next strategy meeting which projects are real P0s and which are vendor noise. Twenty minutes in, you can hand your boss "4 P0s + 5 Week-One decisions" as a written plan.
1. The Map — 6 Business Domains, 4-5 Scenarios Each
Lay the full picture on the table first: retail Agentic AI isn't just two lanes ("smart customer service" and "smart recommendation"). It's 6 business domains, 28 specific scenarios.
+------------------------------------------------------------------+
| Customer Journey |
| Sales-Assistant Copilot After-Sales CS Membership |
| Personalized Marketing In-Store Experience |
+------------------------------------------------------------------+
| Supply Chain |
| Smart Replenishment Slow-Mover Alerts New-SKU Placement |
| Logistics Tracking Supplier Collaboration |
+------------------------------------------------------------------+
| Store Ops |
| Scheduling KPI Coaching Display Compliance |
| Stock-Taking Equipment Maintenance |
+------------------------------------------------------------------+
| People |
| Employee Training Recruiting Filter |
| Performance Feedback Employee Care |
+------------------------------------------------------------------+
| Marketing |
| Campaign Planning Content Gen Competitor Watch |
| Data Analysis Brand Compliance |
+------------------------------------------------------------------+
| Finance & Risk |
| Dynamic Pricing Return-Fraud Detection |
| Financial Forecasting Compliance Review |
+------------------------------------------------------------------+
Shared Infrastructure
Data Hub (Activated) / Tag Factory (Live) / API Gateway / AI Orchestration
Here are the full 28 scenarios across six tables. One reading rule — P0 start immediately, P1 wait for data infrastructure, P2 wait for data assets to mature, P3 push past 2027:
Customer Journey (5)
| Scenario | Core Value | Priority |
|---|---|---|
| After-Sales CS Agent | AI handles 70-80% of conversations, ~60% headcount reduction | P0 |
| Online Sales-Assistant Copilot | Assistant productivity +30%, new-hire ramp -30% | P0 |
| In-Store Sales-Assistant Copilot | Scan-to-card product knowledge, in-store conversion lift | P1 |
| Membership Agent | Auto-trigger birthday/points-expiring, member repurchase +10-15% | P1 |
| Personalized Recommendation Agent | Real-time mini-app homepage personalization, CTR +20-30% | P2 |
Supply Chain (5)
| Scenario | Core Value | Priority |
|---|---|---|
| Smart Replenishment Agent | Stockout rate -20%, AI generates replenishment suggestions (human review) | P0 |
| Slow-Mover Alert Agent | Auto-flag high inventory-age SKUs, turnover +10% | P1 |
| Logistics Anomaly Agent | Proactive in-transit delay alerts, logistics complaints -30% | P1 |
| New-SKU Placement Agent | Store-profile-based allocation, first-week sell-through +15% | P2 |
| Supplier Collaboration Agent | Automated brand sales reports, coordinator productivity +50% | P2 |
Store Ops (5)
| Scenario | Core Value | Priority |
|---|---|---|
| Scheduling Agent | Traffic-history-driven suggestions, advisory only (not auto-write) | P1 |
| Store KPI Coaching Agent | Daily operational-health digest for store managers | P2 |
| Display Compliance Agent | Photo upload + AI compliance check, audit headcount -60% | P2 |
| Stock-Taking Agent | RFID + AI variance detection, stock-take efficiency +70% | P2 |
| Equipment Maintenance Agent | Predictive maintenance for POS etc., failure rate -20% | P3 |
People (4)
| Scenario | Core Value | Priority |
|---|---|---|
| Employee Training Agent | AI role-play as customer, training for sales/CS new-hires | P0 |
| Recruiting Filter Agent | Resume scoring + interview scheduling, hiring cycle -40% | P2 |
| Performance Feedback Agent | Personalized review reports replacing template monthlies | P3 |
| Employee Care Agent | Attrition-risk early warning, compliant sentiment monitoring | P3 |
Marketing (5)
| Scenario | Core Value | Priority |
|---|---|---|
| Content Generation Agent | Product copy / social posts, content throughput 10× | P1 |
| Campaign Planning Agent | Historical-ROI-driven campaign drafts, planner productivity +40% | P2 |
| Competitor Watch Agent | Auto-track competitor pricing/new SKUs, market response +50% | P2 |
| Data Analysis Agent | NL questions, auto-generated reports (Text-to-SQL) | P2 |
| Brand Compliance Agent | Marketing-asset brand-rule checking, avoid violation fines | P2 |
Finance & Risk (4)
| Scenario | Core Value | Priority |
|---|---|---|
| Return-Fraud Detection Agent | Detect anomalous returns (serial returns / new-for-old), fraud loss -30-50% | P1 |
| Price Optimization Agent | Best price within brand-authorized range, gross margin +2-3% | P2 |
| Financial Forecasting Agent | Rolling monthly/quarterly revenue forecast, budgeting cycle -30% | P2 |
| Compliance Review Agent | Brand-authorization contract review, avoid licensing violations | P3 |
The 28 isn't there for you to do them all. It's there so you know you didn't miss anything. The next section compresses the whole table down to 4 P0s.
2. The "Business Value × Tech Complexity" Matrix Has Fooled 90% of Project Managers
I've seen too many AI portfolio plans use the same matrix — vertical axis business value, horizontal axis technical complexity, four quadrants. The problem with that picture isn't that it's wrong, it's that it's too abstract — it doesn't answer "why these 4 ahead of the other 24."
Real P0 filtering answers three questions —
- Can ROI be computed? Business value has to convert to a specific dollar/headcount-savings number — not vague phrases like "improves experience"
- How reusable is the infrastructure? The data/APIs/platform this Agent forces you to build out — will it become the foundation for the next 5 Agents?
- Can it be rolled back? If it underperforms in production, what's the cost of pulling it?
Run the 28 through these 3 questions —
+--------------------------------------+
High | P0 (Start Now) P1 (Next Wave) |
| * After-Sales CS * Logistics |
Business | * Sales Copilot * Return Fraud |
Value | * Employee Train * Slow-Mover |
| * Replenishment * Content Gen |
+--------------------------------------+
| P2 (Third Wave) P3 (Future) |
Low | * KPI Coaching * Display Comply |
| * Recommendation * Forecasting |
| * Campaign Plan * Equipment Maint |
| * Membership * Recruiting |
+--------------------------------------+
Low High
Tech Complexity / Dependencies
The 4 P0s share the same profile — ROI computable, infrastructure reused, rollback possible, validated within 6 months. The other 24 either don't pencil out, or are waiting on infrastructure, or can't be rolled back if they go wrong.
One-sentence P0 test: If this project doesn't ship, can your boss say six months from now "good thing we didn't bet the farm on it"? If the answer is no, it's not P0.
3. Start With Customer Service — Not Because It's Easiest, Because It Forces the Whole Foundation
People push back — "why not start with something sexier, like personalization or dynamic pricing?"
Put the verdict on the table first: customer service Agent looks like one project, but it's actually the foundation-installation plan for your entire Agentic AI stack.
Three reasons, each tied to a concrete observable —
Reason 1: ROI is computable, not fuzzy
Customer service is one of the few retail areas where you can multiply monthly-salary × headcount to compute AI replacement value directly —
- CS headcount is a quantifiable cost line (monthly salary × people × 12)
- AI handling 70% of conversations ≈ 60% headcount reduction
- A retailer with thousands of daily tickets saves tens of millions of RMB per year in headcount
Compare that with "personalization +20% CTR" — from click to checkout there are 6-8 conversion steps, each can absorb blame, and when your boss asks "how much profit did that 20% actually produce," no one has an answer.
Detection signal: If an Agent project's ROI calculation needs more than 3 inference steps (X → Y → Z → revenue), it probably isn't P0.
Reason 2: Infrastructure reuse is highest
The infrastructure that customer-service Agent construction forces you to build is exactly the foundation every later Agent needs —
Infrastructure built during CS Agent construction:
+-------------------------------------------+
| Data Hub APIs (orders / logistics / SKU) |
| Tag Factory (customer profile / tier) |
| 3-layer knowledge base |
| API gateway (unified call layer) |
| AI orchestration workflows |
+-------------------------------------------+
|
v
Directly supports the next 25 Agent scenarios
You look like you're building one customer-service Agent. You're actually forcing the data hub and tag factory to come alive. Build the foundation first, and downstream P1-P2 Agent delivery cycles shrink by 50%+.
Reason 3: Risk is the most controllable — and other candidates can't match this
CS Agent has 3 safety nets the others don't —
- Human handoff: Anything AI is uncertain about can route to a human at any moment, so no irreversible damage
- Gradual rollout: Start at 10% traffic and ramp
- Measurable outcomes: First-call resolution, CSAT, escalation rate — every metric has a clear formula
A replenishment Agent gone wrong → inventory pile-up or stockout, incident within 24 hours. A pricing Agent gone wrong → brand authorization dispute, incident within a week. Customer service has the lowest error cost of the three P0 candidates.
One project solves three problems at once — cost reduction, infrastructure build-out, team capability validation. That's why you start with customer service.
4. Three Things to Do Before You Start — Skip Any and You Pay in Three Months
Before you touch the keyboard, do a diagnostic of the existing systems. Skip any one of these and you'll be back patching three months later.
| System/Module | What to verify | Why it matters for Agent | Action |
|---|---|---|---|
| WeCom CS | In use? API access open? | The Agent's entry point; API integration is mandatory | Confirm API access, check rate limits |
| CS ticket platform | Is the existing bot still running? KPIs tracked? | Old bot is a negative example — typical "deployed and abandoned"; old conversation logs are reusable | Don't depend on the old bot; reuse its conversation logs |
| Ticketing system | Independent vendor? Integrated with CS platform? | Critical path for complaint escalation, human handoff | Map ticketing APIs, define "escalate to human" triggers |
| CS knowledge base | What's the coverage of existing docs? | Agent fuel. No knowledge base, Agent has only model parametric knowledge — accuracy unbounded downward | Top priority (Part 2 covers this) |
| Historical conversation data | Which system? Volume? | The most valuable raw asset — for knowledge extraction, training data, benchmark design | Apply for export access immediately, assess volume + quality |
| Human CS baseline | First-call resolution / handle time / CSAT baseline? | Without a baseline you can't tell if AI is better or worse than the human team | Build the baseline in parallel with the knowledge base (see below) |
The Step Most Teams Skip — Establishing the Human Baseline
One thing most retail organizations don't do — establish the human-CS baseline before AI goes live, and freeze it.
Most retail CS systems don't have systematic evaluation data: no first-call resolution rate, no average handle time, no resolution rate by question type. Once AI ships, the conversation distribution changes, you can never recover a "pure human" baseline — and the question "is AI better than humans" becomes unanswerable forever.
Three months in your boss asks "did that 2M RMB investment have positive ROI" — you can only point at proxy metrics like "per-rep throughput +X%" and there's no real control experiment.
How to establish the baseline (doable in one week) —
- Pull the last 3 months of data from the CS platform
- Random-sample 500 conversations, manually label: question type, handle time, first-call resolution, CSAT, escalation
- Compute core human-CS metrics as your baseline
- Output the TOP 50 high-frequency question list at the same time — this is your knowledge-base build priority
Detection signal: if the team proposes "let's ship AI first and build the baseline later" — full stop. This is the most common booby trap from delivery vendors; six months later ROI is unprovable and you'll have to invest again in regression testing.
5. Five Decisions Week One Must Land — Three About People, Two About Money
If you've decided to launch the CS Agent project, Week One's critical work isn't technical — it's organizational decisions. Technical work can start Week Two and still be on time. But these 5 decisions, if not made in Week One, become project blockers three months later —
Decision 1: Position headcount reduction — target or outcome?
Whether headcount reduction is a goal or a natural outcome doesn't affect the number, it affects the speed of knowledge transfer.
If your messaging is "AI replaces humans" — the customer-service reps who know your customers best will leave proactively before AI ships, and your knowledge base loses its most valuable input source. I've seen a retailer lose 6 senior CS reps to competitors within 3 months, taking know-how that was worth more than the entire AI project investment.
Recommended framing: "AI augments efficiency"; headcount is an evaluated outcome, not a project goal. Lock this messaging in Week One — internal, to vendors, to brand partners.
Decision 2: AI's internal positioning — give the CS team a story where they aren't replaced
- For the CS team: position AI as a productivity tool — AI handles simple/repetitive, humans focus on complex/high-value
- For brand partners: pre-communicate brand-image risk management
- For customers: decide on labeling "AI reply" — transparency vs. seamless
This isn't PR spin, it's how the Agent actually works. A CS Agent that genuinely hits 65% first-call resolution still routes the remaining 30% complex tickets to humans — you need those 30% humans to be motivated.
Decision 3: A small exploratory budget — don't approve 2M RMB upfront
The real first-stage cost (1-2 months) of an AI project —
- AI orchestration platform (cloud server): 1,500-2,000 RMB/month
- LLM inference (validation-stage low volume): 500-1,000 RMB/month
- Vector retrieval / Rerank: 500-700 RMB/month
Total: single-digit thousands RMB/month.
The purpose of this budget isn't "ship a product," it's "validate feasibility + build infrastructure."
Detection signal: If a vendor quotes you 2M RMB for a CS Agent project — ask "how much do you need for the first 8 weeks?" If they can't answer, or it's still 300k+ RMB, alarm bells.
Decision 4: Knowledge-base owner — must be someone who knows the business, not engineering
The biggest bottleneck for knowledge-base construction isn't technology, it's content quality — you need a senior CS rep who knows the business to own content (Part 2 covers this in depth).
The Week-One decision isn't about technical architecture, it's about this person's incentives —
- Does this role get reassigned from "CS rep" to "knowledge-base operations"?
- How do level/comp align?
- If they do well, can they get promoted to management in 3 months?
Without that incentive layer you won't get a senior CS rep's full know-how — they'll hold back enough to preserve their irreplaceability.
Decision 5: IT resources — approval cycles longer than the build itself
The Agent needs to integrate with orders, logistics, ticketing — all owned by IT. API access approval + sandbox environment + interface validation — this path is often slower than the engineering build itself.
I saw one project where engineering finished in 6 weeks, then waited 11 weeks for IT approval before go-live.
Start the IT conversation Week One — don't wait until the knowledge base is done. Get the interface doc list, sandbox request, and key API rate-limit confirmation — these three must land in Week One.
Week One's 5 decisions, in order: 3 about people (headcount strategy, internal messaging, knowledge-base owner), 2 about money/resources (budget, IT support). Technical decisions aren't the Week-One agenda.
6. The 28-Scenario Dependency Map — You're Not Building an Agent, You're Building Infrastructure
One last map for the engineering team. The 28 Agents don't launch in parallel — there are strict dependencies between them:
[Data Foundation] (foundation for every Agent)
Data hub activated (read-only APIs)
| unlocks
+-- Orders/logistics API --> After-Sales CS Agent (logistics queries)
+-- Inventory API --> Replenishment, New-SKU Placement
+-- Sales data API --> Slow-Mover Alerts, Forecasting, KPI Coaching
Tag factory dynamic (read-only --> real-time write-back)
| unlocks
+-- Read-only --> Sales Copilot (personalization), Membership Agent
+-- Dynamic write-back --> Personalization Agent, precision marketing
[Knowledge Base] (foundation for CS-related Agents)
3-layer knowledge base (200 --> 800 --> 2000+ Q&A)
| unlocks
+-- Layer 1 --> Auto-handling of return/exchange policy queries
+-- Layer 2 --> Auto-handling of product queries, in-store Sales Copilot
+-- Layer 3 --> Complaint handling, employee training scenarios
[Vision] (foundation for image-driven Agents)
Multimodal model integration
| unlocks
+-- Quality-complaint image analysis (After-Sales CS reinforcement)
+-- Display Compliance Agent
+-- Brand Compliance Agent
The 4 P0 Agents (after-sales CS, sales copilot, replenishment, employee training) — their value splits in two —
- Half is in the Agent itself (cost reduction + experience uplift)
- Half is in forcing the data hub and tag factory to come alive
Once that infrastructure is activated, P1-P2 Agent delivery cycles shrink by 50%+. When you negotiate with a vendor over "building a CS Agent," you're actually paying the up-front cost for the entire retail AI stack.
Where this leaves you
If you want to use the "28-scenario grading + Week-One 5 decisions" directly in your next strategy meeting — without re-reading this article every time — I packaged a PDF kit for readers who got this far. Send me the keyword "RETAIL 28" and I'll send the pack:
- 28-scenario priority grading table (one-page A3 print — see your whole portfolio P0/P1/P2/P3 in 30 seconds)
- Week-One 5-decision checklist (card version — decision, risk, owner, output — drop it in the team chat and everyone gets it)
- CS Agent vs. other candidate Agents — 6-dimension comparison sheet (built for vendor-proposal reviews)
(Channels in the footer — X or email both work.)
Next: How to Build the Knowledge Base, Pick the Model, and Where the Money Actually Goes
Part 2 tackles the three most painful technical-architecture questions —
- 3-layer knowledge base — how to build? Why "dumping docs into a vector database" wastes you 3 months?
- Domestic vs. overseas models — in CS scenarios, which 4 specific capabilities actually differ?
- The four cost buckets — inference, headcount, infrastructure, integration — how much each, and which one is most underestimated?
Series TOC:
- This article | Part 1: Which 4 of Your 28 'Smart-X' Projects to Start With
- Part 2: Knowledge Base Caps the Ceiling, the Model Is Just a Tool
- Part 3: 80% of Failed Bots Were Ops Failures, Not Tech
Subscribe for updates
Get the latest AI engineering posts delivered to your inbox.