AI Chatbot UX Design: Rules That Actually Work in SEA

Brands across Southeast Asia are deploying AI chatbots at pace — on Shopee storefronts, LINE Official Accounts, bank mobile apps, and e-commerce checkout flows. The deployment speed is impressive. The design rigour, less so.

Most teams treat chatbot UX as a sprint task: get it live, iterate later. But Nielsen Norman Group’s latest research on AI chatbot design guidelines makes clear that the foundational decisions — what the bot says it can do, how it signals its limits, whether it understands context — are precisely what determine whether users trust it or abandon it within two exchanges.

Microsoft, Google, and IBM have each published structured guidelines for Human-AI Interaction design. UX Design’s Dora Czerna recently mapped where those frameworks converge and where they leave gaps. The consensus is stronger than expected: all three agree that AI systems must make their capabilities transparent, communicate uncertainty clearly, and allow users to correct or override outputs without friction.

What the frameworks underweight is context-sensitivity at scale. A chatbot deployed in a Thai retail app serves users who are simultaneously comparing prices on Lazada, messaging friends on LINE, and switching between Thai and English mid-sentence. The interaction model has to absorb that reality — and the global frameworks were not written with that user in mind.

What Nielsen Norman Group’s Guidelines Actually Say

NN Group’s research, authored by Georgia Kenderova, Maria Rosala, and Tanner Kohler, narrows to ten concrete design guidelines for site-specific AI chatbots. Three are worth pulling forward for teams building in SEA markets.

First: state the bot’s scope at the moment of first interaction. Not buried in an onboarding flow — on the opening message. Users who understand what a chatbot cannot do are significantly less likely to abandon after a failed query. For a telco like AIS or Celcom, that means the chatbot should immediately distinguish whether it handles billing, technical support, or both — and what requires a human handoff.

Second: surface contextually relevant prompt suggestions based on what the user is looking at. If a user lands on a product page for a 5G router, the chatbot should not offer generic options like “How can I help you today?” It should surface “Check compatibility with my current plan” or “Compare to the mesh router.” NN Group’s research shows this reduces the blank-input problem that kills chatbot engagement on first load.

Third: signal uncertainty explicitly — and do it early. An AI that hedges confidently is more trusted than one that answers with false precision. This is especially consequential in financial services, healthcare, and legal contexts where several SEA markets have active regulatory expectations around AI disclosure.

The Research Feedback Loop Problem

Here is where I want to connect the design conversation to a more uncomfortable operational truth. Teams building and iterating these chatbot experiences often rely on user panels for qualitative feedback. NN Group’s Lola Famulegun published research this week on exactly why user panels fail — and the deterioration patterns are predictable: panel fatigue sets in, participants over-learn the product’s logic, and responses start reflecting what users think the team wants to hear rather than genuine behaviour.

For AI chatbot design specifically, this is a compounding problem. A user panel that has been running for three months with the same participants will systematically underreport confusion — because those participants are no longer confused. They have adapted to the bot’s quirks in ways new users never will. The result is a research signal that flatters the product while masking its real friction points.

The fix is not to abandon qualitative research. It is to rotate panel participants more aggressively and to pair panel data with behavioural signals: session drop-off points, query reformulation rates, handoff-to-human request frequency. Those signals do not get fatigued. They tell you exactly where the design is failing users who have not had the chance to learn its workarounds.

Designing for the SEA Stack

Mobile-first is not a design preference in Southeast Asia — it is the architecture reality. A meaningful proportion of chatbot interactions in markets like Indonesia and the Philippines happen on mid-range Android devices, on 4G connections that vary throughout the day, inside apps that are already memory-constrained. This changes several design decisions that the global frameworks treat as secondary.

Typing-heavy chat interfaces perform worse on smaller screens with gesture keyboards. Tap-to-select prompt suggestions — rather than open text inputs — consistently outperform open input on mobile, particularly for first-time users or users navigating in a second language. Brands like GrabFood have iterated this pattern extensively in their in-app support flows, relying on structured quick-reply buttons rather than freeform chat for the majority of service queries.

Multilingual interface design adds another layer. A chatbot that handles Thai, Bahasa Indonesia, and English in the same session needs explicit switching logic — or it will default to one language and alienate users who code-switch naturally. Building language detection with an explicit override option (not just an auto-detect) reduces the friction of being misread by the system.

Toward Better Defaults

The gap between what the global AI design frameworks recommend and what most deployed chatbots actually do is not a knowledge problem — the NN Group guidelines are public and specific. It is an incentive problem. Shipping speed, feature count, and stakeholder pressure to demonstrate AI capability all push against the patient, iterative work of getting the foundational UX right.

But foundational UX compounds. A chatbot that correctly scopes its capabilities, surfaces contextual prompts, and signals uncertainty honestly will generate cleaner interaction data — which makes the next design iteration faster and more accurate. The brands that invest in this layer now are building a structural advantage that is genuinely hard to copy quickly.

The question worth sitting with: when your team last reviewed the chatbot’s opening message, was the review driven by real drop-off data — or by what the internal user panel said felt fine?

At grzzly, we work with digital teams across Southeast Asia on exactly this intersection — translating behavioural data from AI tools into UX decisions that reduce friction and improve conversion. If your chatbot is live but you are not sure it is working as hard as it should be, that is a conversation worth having. Let’s talk

AI Chatbot UX Design: Rules That Actually Work in SEA

The Three Frameworks Worth Reading (and Their Blind Spots)

What Nielsen Norman Group’s Guidelines Actually Say

The Research Feedback Loop Problem

Designing for the SEA Stack

Toward Better Defaults

Enjoyed this?
Let's talk.

AI Chatbot UX Design: Rules That Actually Work in SEA

The Three Frameworks Worth Reading (and Their Blind Spots)

What Nielsen Norman Group’s Guidelines Actually Say

The Research Feedback Loop Problem

Designing for the SEA Stack

Toward Better Defaults

Enjoyed this?Let's talk.

Enjoyed this?
Let's talk.