How We Built a Query Engine That Tracks Brands Across 8 AI Models
Search is shifting from links to language. Here's how we built Monitor's query engine to track brand presence across AI responses at scale.

The technical problem
Building a rank tracker for traditional search is well-understood. You call a search API, parse the results, store positions, repeat. The infrastructure is mature and the signals are consistent.
Building one for AI search is a completely different problem — and it's one we spent the first four months of Monitor's development solving.
Why AI tracking is hard
Non-determinism
AI language models are not deterministic. Ask the same question twice and you may get two different answers. The model samples from a probability distribution, which means results vary even with identical inputs. This creates a fundamental measurement challenge: if you query once and get result X, is that signal or noise?
Our solution: we run each query multiple times per engine per day and aggregate across runs. We use statistical sampling to identify stable signals (brands that consistently appear) versus noisy ones (single-run appearances that don't hold up). The visibility score you see in Monitor is a weighted aggregate, not a single-query snapshot.
Rate limits and cost
Each AI engine has its own API, its own rate limits, and its own pricing structure. At scale — tracking 1,000+ queries across 8 engines with multiple runs — the call volume becomes significant. Early on, our infrastructure bill was growing faster than our query coverage.
We solved this with a tiered crawl strategy: high-priority queries (your top 20 by category importance) run multiple times daily. Mid-tier queries run once daily. Long-tail queries run every 3 days. This lets us cover a broad query universe without proportional cost growth.
Extracting structured data from natural language
AI engines don't return structured JSON with brand rankings. They return prose. "I'd recommend looking at Notion, Linear, or Asana for this use case" is the kind of output we need to parse into structured rank data.
We built a secondary extraction layer that takes raw AI responses and identifies brand mentions, assigns position scores based on order and emphasis, and normalizes across different response formats (bulleted lists, numbered rankings, inline prose recommendations). This layer runs on every response and is continuously refined based on edge cases.
Engine diversity
ChatGPT, Claude, Perplexity, Gemini, and Copilot all have different API structures, different response formats, different context window behaviors, and different update frequencies. Some support system prompts, some don't. Some have web search baked in, some don't. Building a unified tracking layer across all of them required engine-specific adapters with a common output schema.
What we're building next
The current engine handles brand mention extraction well. The next layer we're building is intent classification — understanding not just whether your brand appeared, but in what context. Were you recommended as the best option, or mentioned as an alternative? Were you praised for a specific feature, or listed neutrally? That contextual layer will unlock a new class of optimization recommendations that go beyond "appear more" to "appear better."
We'll share more on that in a future post.




