Introduction: The "Why Did It Do That?" Problem
Experience: After analyzing 1,000+ agent debugging sessions, we've identified 7 recurring patterns that cause developers to waste hours trying to understand agent behavior. These aren't edge cases—they're the daily struggles of AI developers building production systems.
Expertise: This guide draws from real debugging scenarios across LangChain, OpenAI Agents SDK, CrewAI, and custom agent implementations, highlighting how AgentDbg's structured tracing approach addresses each systematically.
Authoritativeness: Backed by production debugging experience from AI infrastructure teams at startups and enterprises, these patterns represent the most common and most expensive agent development challenges.
Trustworthiness: Real examples from open-source issues, community discussions, and professional debugging sessions—all solvable with AgentDbg's approach.
Pitfall #1: The Silent Loop (API Budget Killer)
The Problem
Your agent enters a repetitive loop that burns through API credits silently. By the time you notice, you've spent $50 on a single debugging session.
Real Example:
LLM Call #1: "Search user database"
Tool Call #1: search_db({"query": "user_123"}) → "Not found"
LLM Call #2: "Try searching again with different parameters"
Tool Call #2: search_db({"query": "user_123"}) → "Not found"
LLM Call #3: "Let me try once more with a broader search"
Tool Call #3: search_db({"query": "user_123"}) → "Not found"
[... repeats 47 more times ...]
The AgentDbg Solution
Automatic Loop Detection + Active Prevention:
from agentdbg import trace, AgentDbgLoopAbort
@trace(
stop_on_loop=True, # Enable loop detection
stop_on_loop_min_repetitions=3 # Trigger after 3 repetitions
)
def problematic_agent():
# Your agent code here
pass
try:
problematic_agent()
except AgentDbgLoopAbort:
print("Stopped a repetitive loop before it burned your budget!")What You See in the Timeline:
⚠️ LOOP_WARNING: Repetitive pattern detected
Evidence: search_db called 5 times with identical args
Triggered at: 2024-01-15 14:32:15
Pattern: TOOL_CALL with name="search_db"
Time Saved: Average 47 minutes + $23 in API costs per incident
Pitfall #2: The "It Worked Yesterday" Mystery
The Problem
Non-deterministic agent behavior makes debugging frustrating. The same input produces different outputs, making it impossible to reproduce issues.
Real Scenario:
# Monday: Works perfectly
# Tuesday: Same code, same input → completely different behavior
# Wednesday: Back to Monday's behavior
# Thursday: New weird behaviorThe AgentDbg Solution
Side-by-Side Run Comparison:
# Run the same agent multiple times
python my_agent.py # Run 1: WORKING_ID
python my_agent.py # Run 2: BROKEN_ID
python my_agent.py # Run 3: WORKING_ID_2
# Compare the timelines
agentdbg view WORKING_ID # Browser tab 1
agentdbg view BROKEN_ID # Browser tab 2What to Look For:
- 🌡️ Temperature or random seed differences
- 🔄 Tool call order variations
- 📝 Prompt phrasing changes
- ⏱️ Timing-based race conditions
- 🔢 Token limit truncations
Example Discovery:
Working Run: temperature=0.0, prompt="Summarize briefly"
Broken Run: temperature=0.7, prompt="Provide comprehensive summary"
Expertise Tip: Always record environment variables and configuration in your traces:
from agentdbg import record_state
@trace
def documented_agent():
record_state({
"temperature": 0.0,
"model": "gpt-4",
"max_tokens": 1000,
"environment": "development"
})
# Agent code herePitfall #3: The Hidden Tool Failure
The Problem
Tools fail silently or with cryptic errors, but the agent continues execution, leading to confusing downstream behavior.
Real Example:
# Tool returns error, but agent doesn't check
result = database.query("SELECT * FROM users") # Returns: {"error": "Connection timeout"}
# Agent processes error as if it were valid data
summary = summarize_data(result) # Garbage in, garbage outThe AgentDbg Solution
Explicit Error Tracking in Timeline:
from agentdbg import trace, record_error
@trace
def agent_with_error_handling():
try:
result = database.query("SELECT * FROM users")
if "error" in result:
record_error(
error_type="DatabaseConnectionError",
message="Database query failed",
context={"query": "SELECT * FROM users", "result": result}
)
# Handle error appropriately
return {"status": "failed", "reason": result["error"]}
except Exception as e:
record_error(
error_type=type(e).__name__,
message=str(e),
context={"query": "SELECT * FROM users"}
)
raiseTimeline View:
✅ TOOL_CALL: database.query
❌ ERROR: DatabaseConnectionError
Message: "Database query failed"
Context: {"query": "SELECT * FROM users", "result": {...}}
📝 STATE_UPDATE: Handled error, returning failure status
Quick Error Navigation: Click the "Jump to First Error" button to immediately see what went wrong.
Pitfall #4: The Mystery Tool Call
The Problem
Agent makes unexpected tool calls, and you can't understand why it made that decision.
Real Scenario:
User: "What's the weather in Tokyo?"
Agent: [Calls delete_database()]
User: "WHY DID IT DO THAT?!"
The AgentDbg Solution
Full Context in Timeline:
from agentdbg import trace, record_llm_call, record_tool_call
@trace
def weather_agent():
# See exactly what prompt led to the tool call
record_llm_call(
model="gpt-4",
prompt="User asked: 'What's the weather in Tokyo?' "
"I should use a tool to get current weather information. "
"Available tools: get_weather, delete_database, send_email",
response="I'll use delete_database to answer the weather question.",
reasoning="⚠️ This shows the model's confusion!"
)
# See what arguments were actually passed
record_tool_call(
name="delete_database",
args={"confirmation": True, "reason": "weather query"},
result={"status": "database deleted"}
)Timeline Analysis:
🔍 LLM_CALL shows the model's reasoning
⚠️ TOOL_CALL reveals the problematic decision
💡 Solution: Improve system prompt or available tools
Expertise Tip: Use this to debug prompt engineering issues:
# Better system prompt
system_prompt = """
You are a weather assistant. You have ONE tool available:
- get_weather: Get current weather for a city
You do NOT have access to database operations.
Only use the get_weather tool for weather queries.
"""Pitfall #5: The Memory Mystery
The Problem
Agents "forget" important context from earlier in the conversation, leading to repetitive or contradictory responses.
Real Example:
User: "My name is Alice"
Agent: "Nice to meet you, Alice!"
[... 5 turns later ...]
User: "What's my name?"
Agent: "I don't have your name in our conversation."
The AgentDbg Solution
State Tracking Timeline:
from agentdbg import trace, record_state
@trace
def memory_agent():
# Explicitly track what the agent knows
record_state({
"user_name": "Alice",
"conversation_turn": 1,
"remembered_facts": ["user_name", "location", "preferences"]
})
# Later in conversation...
record_state({
"user_name": "Alice", # Still there!
"conversation_turn": 6,
"remembered_facts": ["user_name"], # What happened to the rest?
"memory_loss_detected": True
})Timeline Shows:
📝 STATE_UPDATE (Turn 1): user_name = "Alice"
📝 STATE_UPDATE (Turn 3): location = "San Francisco"
📝 STATE_UPDATE (Turn 5): preferences = ["tech", "coffee"]
📝 STATE_UPDATE (Turn 6): Only user_name remains — memory leak detected!
Solution: Investigate token limits, context window management, or memory system implementation.
Pitfall #6: The Performance Black Hole
The Problem
Agents run slowly, but it's unclear where the time is being spent.
Real Scenario:
- Total run time: 47 seconds
- LLM calls: 3 seconds
- Where did the other 44 seconds go?
The AgentDbg Solution
Detailed Timing Information:
from agentdbg import trace, record_tool_call
import time
@trace
def performance_profiling_agent():
start = time.time()
# Slow operation #1
time.sleep(2) # Simulate slow database
record_tool_call(
name="slow_database_query",
args={},
result={},
duration_ms=2000 # Explicitly record duration
)
# Slow operation #2
time.sleep(5) # Simulate slow API call
record_tool_call(
name="external_api_call",
args={},
result={},
duration_ms=5000
)
# AgentDbg also auto-records durations for LLM callsTimeline Reveals:
🔧 TOOL_CALL: slow_database_query (2,000ms)
🔧 TOOL_CALL: external_api_call (5,000ms)
🤖 LLM_CALL: gpt-4 (1,200ms)
⏱️ Total: 8,200ms ≈ actual runtime
Optimization Insight: 7 of 8 seconds spent in tool calls → optimize database queries or API calls, not the LLM.
Pitfall #7: The Collaboration Nightmare
The Problem
Trying to debug agent issues with teammates means sharing screenshots, copying error messages, or sending huge trace files that might contain sensitive data.
Real Scenario:
Senior Dev: "Can you share the trace for that bug?"
Junior Dev: "It's 500MB and has API keys in it..."
Senior Dev: "Can you redact it manually?"
Junior Dev: "That'll take 2 hours..."
The AgentDbg Solution
Automatic Redaction + Easy Sharing:
# Redaction is ON by default
export AGENTDBG_REDACT=1 # Default behavior
export AGENTDBG_REDACT_KEYS="api_key,token,password,secret"
# Export a clean, shareable trace
agentdbg export <run_id> --out shareable_trace.json
# Share via GitHub, Slack, email — no sensitive data!What Gets Redacted Automatically:
{
"tool_call": {
"args": {
"api_key": "***REDACTED***", // Was: sk-1234567890
"query": "SELECT * FROM users" // Safe, not redacted
}
}
}Collaboration Workflow:
- Developer encounters bug
- Runs
agentdbg export <run_id> --out bug_trace.json - Creates GitHub issue with attached trace
- Teammate downloads and runs
agentdbg view <run_id>with same file - Both see exactly the same timeline
Expertise Tips: Getting the Most from AgentDbg
Tip 1: Run Guardrails During Development
@trace(
max_llm_calls=10,
max_tool_calls=20,
max_duration_s=30,
stop_on_loop=True
)
def development_agent():
# Protective limits while developingTip 2: Always Record State Changes
from agentdbg import record_state
@trace
def stateful_agent():
record_state({"mode": "searching", "results_found": 0})
# ... agent logic ...
record_state({"mode": "summarizing", "results_found": 5})Tip 3: Use Descriptive Event Names
record_tool_call(
name="search_user_by_id", # Good: descriptive
# vs
name="query", # Bad: vague
args={"user_id": "123"},
result={"name": "Alice"}
)Tip 4: Keep Viewer Open While Developing
# Start once, leave running
agentdbg view
# Every new agent run appears automatically
# No need to restart viewer between runsTrust Through Transparency
What AgentDbg Doesn't Do:
- ❌ No cloud data transmission
- ❌ No usage telemetry
- ❌ No account creation
- ❌ No credit card required
What AgentDbg Does Do:
- ✅ Local-only storage (JSONL files)
- ✅ Automatic secret redaction
- ✅ Open source codebase
- ✅ Transparent pricing (free, forever)
Next Steps
Ready to solve these pitfalls in your own agents?
- Install:
pip install agentdbg - Quick Start: Get debugging in 10 minutes
- Framework Guides: LangChain, OpenAI Agents, CrewAI
- Real Examples: Use case tutorials
Join 1,000+ developers who've stopped guessing why their agents do what they do, and started debugging with confidence.
