Indonesia Singapore ไทย Pilipinas Việt Nam Malaysia မြန်မာ ລາວ
← Back to Blog

AI Context Windows and the First-Party Data You're Wasting

The quality of your AI outputs is a direct function of your first-party data architecture — fix the foundation before scaling the model.

An AI agent standing at a doorway, only able to see what light comes through — representing limited context windows in marketing data systems
Illustrated by Mikael Venne

AI agents are only as smart as the context you feed them. Here's how first-party data architecture determines whether your AI actually works.

Marcel Duchamp didn’t change the urinal. He changed the context around it — and suddenly it was art. Your AI is doing something structurally identical with your customer data. Every day.

Nick Albertini at Tealium recently drew a line from Duchamp’s Fountain to the AI context window problem, and it’s one of the sharper analogies in recent martech writing. The argument: the same data, placed in different contexts, produces entirely different outputs. Which means that what you feed your AI — when, how, and with what surrounding signal — matters more than the model itself. For brands in Southeast Asia sitting on growing pools of first-party data but seeing underwhelming AI outputs, this reframes the question entirely. The problem probably isn’t your model. It’s your context architecture.

Your AI Is Only as Smart as What You Tell It

Context windows — the information an AI model can “see” at once when generating a response or making a decision — are finite. That much is widely understood. What’s less discussed is the quality problem: most brands are filling those windows with the wrong data, in the wrong sequence, without consent provenance attached.

Albertini’s piece makes the point that context isn’t just volume — it’s relevance, recency, and relational signal. A customer’s purchase three years ago on Lazada tells your AI very little about their intent today. But their browse behaviour from last Tuesday, combined with a loyalty tier change and a recent in-app survey response? That’s a context window worth using.

For first-party data programmes, this is the activation test. Collecting data is table stakes. Structuring it so that AI agents can draw on the right signal at the right moment — that’s the actual work. Brands that treat their CDP as a passive archive will get passive AI outputs. Brands that engineer real-time, consent-linked, behavioural context will get something closer to intelligence.

Agent Experience Is Now a Data Design Problem

Lior Gavish at Monte Carlo Data published a candid account of what happens when you build for AI agents as primary users rather than humans — and the failures are instructive. The team found that agents break in ways humans don’t: they follow instructions too literally, they hallucinate when context is ambiguous, and they have no tolerance for data quality gaps that a human analyst would quietly paper over.

This is a significant shift for data teams. When humans are the end-users of a dashboard or a segmentation model, they bring judgment to fill the gaps. When an AI agent is the end-user — orchestrating a personalisation decision, triggering a campaign, or routing a support query — there are no gaps it can fill graciously. Bad data architecture becomes a customer experience failure almost immediately.

For Southeast Asian brands running AI-assisted personalisation on platforms like Shopee or LINE, this is urgent. Multi-language interfaces, fragmented identity graphs across mobile and desktop, and platform-specific behavioural signals all create data quality gaps that agents will exploit in the worst possible ways. The fix isn’t more AI — it’s cleaner, better-labelled, consent-verified first-party data pipelines feeding into those agents.


Rare Signals Are Worth More Than You Think

Separately — and this is where the solar flare research becomes a useful provocation — Marco Tallarico’s work on using transformer models to predict extremely rare events carries a counterintuitive lesson for marketing data teams. The ML challenge with rare solar flares is that standard models trained on abundant data are structurally poor at detecting low-frequency, high-impact signals. The solution involves architectural choices that specifically weight and preserve rare event data rather than letting it drown in the noise.

Marketing data has an analogous problem. High-intent micro-moments — a user who visits a pricing page three times in 48 hours, a loyalty member who re-engages after 14 months of silence, a customer who completes a preference survey unprompted — are rare relative to the full behavioural dataset. Standard segmentation and AI models, trained mostly on what the majority of customers do most of the time, will under-index on these signals.

The implication for first-party data programme design: build explicit capture and flagging mechanisms for rare but high-value behavioural signals. Don’t let them wash out in aggregation. In practical terms, this means event-level data retention with behavioural tagging, not just session-level summaries — and it means ensuring those signals are surfaced, not suppressed, when they enter an AI agent’s context window.

Building Context Architecture That Scales

Pulling these threads together, there’s a coherent design principle emerging: the brands that will get the most from AI in 2026 and beyond are those who treat first-party data not as a reporting asset but as a context engineering asset.

Concretely, this means three things. First, consent architecture that travels with the data — so AI agents don’t inadvertently use data in ways that violate the preferences of the people it belongs to. In markets like Thailand and Indonesia, where PDPA and similar regulations are increasingly enforced, this isn’t optional. Second, data freshness standards — stale data in a context window is worse than no data, because it actively misleads. Real-time or near-real-time event streaming into your customer data platform is the infrastructure investment that makes AI trustworthy. Third, signal tiering — a deliberate taxonomy that tells your AI agents which data points are high-confidence behavioural signals versus low-confidence inferences, so the rarest and most valuable signals aren’t buried.

Duchamp’s insight was that context transforms meaning. The same is true of your customer data. The question isn’t whether you have enough of it. It’s whether you’ve built the architecture to put the right slice of it in front of your AI at exactly the right moment.

As AI agents take on more autonomous roles in campaign orchestration and personalisation, the brands that win won’t necessarily have the biggest models or the most data — they’ll have the most intelligently designed context. What does your current data architecture actually tell your AI about each customer, and is it enough to act on?


At grzzly, we work with marketing and data teams across Southeast Asia to design first-party data programmes that are built for exactly this kind of activation — consent-compliant, AI-ready, and structured to surface the signals that matter rather than just accumulate volume. If you’re re-evaluating your data architecture ahead of an AI investment, we’d like to be part of that conversation. Let’s talk

Lavender Grizzly

Written by

Lavender Grizzly

Turning privacy constraints into competitive advantage. Builds first-party data programmes that are compliant by design, valuable by intent, and trusted by the people whose data they hold.

Enjoyed this?
Let's talk.

Start a conversation