Skip to main content

Intent Recognition: Your First Line of Defense

·8 mins

Part of the Prompting from Code series, this article explores how intent classification reduces ambiguity and improves reliability in agentic systems.

The Complexity Creep #

Your agent started with three tools and clear instructions. Then you added support for another workflow. Then edge case handling. Then five more tools because users kept asking. Six months later, you’re passing 30 tool definitions to every request and the model’s responses have become less reliable.

And they are not just slower they are inconsistent. The model picks the wrong tool. It misinterprets ambiguous requests. It makes creative decisions when you needed precision. So you add more instructions to the prompt: “IMPORTANT: Only use tool X when…” but the inconsistency persists, just manifesting differently.

Eventually you reach a point where you are uncertain if further instructions made things better or worse and each fix introduces two new problems.

When you prompt from code, 80% reliability isn’t acceptable. There’s no human in the loop to catch mistakes. Each request needs to execute correctly because there’s no immediate opportunity for clarification or correction. The problem isn’t your prompt or tuning, it’s that you’re expecting the model to navigate too many possibilities at once.

The problem is architectural.


The Real Cost of Flexibility #

More options create more noise. When you give a model access to everything, you’re not maximizing capability, you’re maximizing ambiguity. Every additional tool is another decision point, another potential wrong turn.

Developers optimize for flexibility. It feels like good architecture. One agent that can handle anything means fewer moving parts, simpler code, easier maintenance. But that flexibility has costs:

  • Inconsistent tool selection: With 30 tools available, the model sometimes picks tool #17 when it should have picked #4. Not on every request, but frequently enough to make the end result noticeably unsatisfactory.
  • Unpredictable outputs: The same input produces different results because the model’s decision tree is too complex. Small prompt variations or model updates cascade into large behavioral changes.
  • Prompt bloat: You compensate by adding more instructions: “When the user says X, use tool Y.” Your system prompt grows to thousands of tokens and you’re essentially encoding a routing table in natural language.

The instinct is to call ever bigger and “better” models, hoping they’ll fix everything. But each model switch comes with risks. Prompts are often tailored (overfitted) to the current model. What worked reliably with GPT-4 might behave differently with Claude Sonnet or GPT-5. You’re not fixing the architecture, you’re adding patches on top of a fragile system.

The tension: You need the model to be flexible enough to handle real user variability, but constrained enough to be predictable. Finding that balance is the core challenge of building reliable agentic systems.


The Pattern: Classify First, Execute Sharp #

Instead of giving one agent all the tools, route before execution.

  1. Fast classification: Use a cheap, fast model (e.g., Claude Haiku, GPT-4o-mini) to determine user intent
  2. Code-based routing: Based on classification, route to a specialized agent in code
  3. Focused execution: Each specialized agent sees only the tools and context relevant to its domain

Example flow:

User: "I need help with my recent order"
Classifier (Haiku): "order_support" 
Code routes to: OrderSupportAgent
Agent sees: [get_order_status, cancel_order, modify_shipping, initiate_return, contact_support]
  (not: create_account, generate_report, update_billing, search_products, etc.)

Why this works:

Quality: The specialized agent isn’t distracted by irrelevant tools. Less ambiguity means more consistent behavior. When you understand how models behave under different conditions, you can design systems that work with that behavior rather than fighting it.

Cost & Latency: Haiku, GPT-4o-mini and similar models cost pennies per request and respond within a few hundred milliseconds. The expensive model then runs with minimal required context and a targeted prompt and set of tools. This makes the time between request and response much shorter, produces a better result and is cheaper.

Debuggability: When something breaks, you know which agent failed. Logs are cleaner. Error patterns are clearer.

This architectural shift has a direct parallel in data science.


The Data Science Lesson: Dimensionality Reduction #

In data science, you narrow the problem space before training. You drop irrelevant features, constrain the search space, and use domain knowledge to reduce dimensionality. A model searching a 100-dimensional space where only 10 dimensions matter will perform worse than one focused on those 10. More noise, more overfitting, slower convergence. Grid search across unnecessary dimensions is expensive and yields marginal gains at best.

Intent classification does the same thing for LLM requests. Instead of giving the model 30 tools and hoping it figures out which 4-5 are relevant, you classify upfront and route to an agent that only sees those 4-5. This doesn’t limit the model’s capabilities, it eliminates noise from the decision space.

Preprocessing beats optimization. In data science, thoughtful feature engineering and domain knowledge often matter more than hours of hyperparameter tuning. In LLM systems, smart routing and architecture matter more than endless prompt refinement for a specific model.

The principle: Narrow the problem space through deliberate design, not model inference. Handle routing in code based on a simple classification step, not through unreliable prompt instructions (“Use tool X for scenario Y…”).


Implementation: Fast Classification and Routing #

The architecture in a diagram:

graph TD A[User Request] --> B[Fast Classifier
Haiku/GPT-4o-mini
] B -->|order_support| C[Order Support Agent] B -->|account_mgmt| D[Account Agent] B -->|product_inquiry| E[Product Agent] B -->|low confidence| F[Clarify Agent] C --> C1[Tools: get_order_status
cancel_order
modify_shipping
initiate_return
contact_support] D --> D1[Tools: update_profile
change_billing
view_settings
close_account] E --> E1[Tools: search_products
get_product_details
check_availability
compare_products
get_recommendations] F --> F1[Tools: ask_clarifying_question] style B fill:#1e3a8a,stroke:#60a5fa,stroke-width:2px,color:#fff style C fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff style D fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff style E fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff style F fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff

Classification logic (pseudocode):

def classify_intent(user_message: str) -> tuple[str, float]:
    categories = [
        "order_support: Questions about orders, shipping, returns",
        "account_management: Settings, billing, profile",
        "product_inquiry: Product questions, recommendations"
    ]
    
    prompt = f"""Classify this message: {user_message}
                 Categories: {categories}
                 Respond with only the category name and confidence (0-1)."""
    
    response = fast_model.complete(prompt, max_tokens=50)
    intent, confidence = parse_classification(response)
    
    # Low confidence or invalid classification routes to clarify
    if confidence < 0.7 or intent not in valid_categories:
        return "clarify", confidence
    
    return intent, confidence

Routing logic (pseudocode):

def handle_request(user_message: str):
    intent, confidence = classify_intent(user_message)
    
    match intent:
        case "order_support":
            return order_agent.run(user_message)
        case "account_management":
            return account_agent.run(user_message)
        case "product_inquiry":
            return product_agent.run(user_message)
        case "clarify":
            return clarify_agent.run(user_message)

Handling multi-intent requests: When users combine multiple intents (“I want to change my order and update my billing info”), you have options. For ambiguous cases, route to the primary intent or use the clarify agent to ask for priority. For high-confidence multi-intent requests, the classifier can return multiple intents and an orchestrator handles them in parallel or sequentially, then combines the results. The principle stays the same: break things up, use specialized agents and build the flow to your needs.

Specialized agent configuration (pseudocode):

order_agent = Agent(
    model="gpt-4",
    system_prompt="You handle order management. Be concise and action-oriented.",
    tools=[
        get_order_status,
        cancel_order,
        modify_shipping,
        initiate_return,
        contact_support
    ],
    context=customer_data + order_history
)

account_agent = Agent(
    model="gpt-4",
    system_prompt="You handle account settings, billing, and profile management.",
    tools=[
        update_profile,
        change_billing,
        view_settings,
        close_account
    ],
    context=customer_data + account_settings
)

product_agent = Agent(
    ...
    )

clarify_agent = Agent(
    model="gpt-4",
    system_prompt="Ask clarifying questions to understand user intent.",
    tools=[ask_clarifying_question],
    context=None  # No context needed for clarification
)

Comparison: Monolithic vs Classified:

graph LR subgraph before["Before: Monolithic Agent"] A1[User Request] --> B1[Single Agent
30 tools
5000 token prompt
~7s] B1 --> C1[Response] end subgraph after["After: Intent Classification"] A2[User Request] --> B2[Classifier
~200ms] B2 --> C2[Specialized Agent
5 tools
800 token prompt
~2s] C2 --> D2[Response] end before ~~~ after style B1 fill:#7f1d1d,stroke:#ef4444,stroke-width:2px,color:#fff style B2 fill:#14532d,stroke:#4ade80,stroke-width:2px,color:#fff style C2 fill:#14532d,stroke:#4ade80,stroke-width:2px,color:#fff

The key insight: Each specialized agent operates in a narrower problem space with only the tools and context it needs.


When to Use This Architecture #

Use intent classification when:

  • You have multiple distinct workflows (support, sales, technical, administrative)
  • Tool count is growing beyond a handful of specialized tools
  • You’re seeing inconsistent tool selection and flakiness
  • Request latency or cost is becoming a problem
  • Different intents need different contexts (customer data vs. product catalog vs. internal docs) and different actions (chat with customer vs. perform internal actions against the database)

Skip it when:

  • You have fewer than 5 tools with clear, non-overlapping purposes
  • All requests need similar context anyway
  • The agent has a clear, single purpose

If you find yourself writing prompt instructions that describe when to use which tools or what to look out for in the user message, you probably need routing in code instead.


Tradeoffs and Failure Modes #

The downside is that you maintain N+1 agents instead of one, the classifier can route incorrectly requiring fallback strategies and every new feature may require updates to both classification logic and specialized agents. For simple use cases, this is unnecessary complexity, but the threshold where the benefits (reliability, quality, cost, debuggability) outweigh the overhead is likely much lower than you might expect.


Conclusion: Architecture Over Prompts #

The first reflex when LLM outputs become inconsistent is to fix the prompt. Add just one more instruction, a few more examples on what to look out for and more “CRITICAL:” blocks. But mostly the problem is architectural.

Intent classification is about control and reliability. When you prompt from code, you want to get to 100% reliability as close as possible. There’s no human to catch errors or ask for clarification. You’re deliberately choosing what the model sees, rather than hoping it ignores irrelevant options. You’re routing with code you can test and debug, not with natural language that models interpret creatively.

Understanding model behavior is key to building reliable systems. Models perform better with narrower decision spaces, so design accordingly. Prompts are often overfitted to specific models, so break things up and build abstraction layers that survive model migrations.

The principle extends beyond routing. It’s about understanding which problems belong in code (routing, validation, fallbacks) and which belong with the model (understanding nuance, handling variability within a domain, finding the right tone, etc.).

As your agentic systems grow, the question isn’t “how do I make this one prompt do more?” It’s “which decisions should the model make, and which should my code make?”