ADR-060: Agent Persona Token Optimization
Status: Accepted Date: 2025-11-28 Deciders: Engineering Team
Context
The multi-agent loan processing system was consuming ~84,000 input tokens per loan application across 4 agents: - Intake Agent: 12,575 tokens - Credit Agent: 26,998 tokens - Income Agent: 27,330 tokens - Risk Agent: 16,964 tokens
Problems: - High API costs (~$310/day at 1,000 applications) - Slow response times due to large context windows - Verbose personas with redundant content - Example scenarios duplicating output format specifications
Goal: Reduce token consumption by 50%+ while maintaining agent decision quality.
Decision
Apply industry best practices from Anthropic and OpenAI for prompt optimization:
Optimization Techniques
- Remove Redundant Content
- Eliminate verbose explanations of concepts agents inherently understand
- Remove multiple restatements of role/responsibilities
-
Consolidate repeated compliance/security sections
-
Convert Tables to Compact Rules
- Before: Multi-column markdown tables with explanations
-
After: Single-line rules (e.g.,
Income/Loan: ≥4x=Excellent, 3-4x=Good, 2-3x=Fair) -
Use Imperative Commands
- Before: "You should analyze the application data to identify behavioral patterns"
-
After: "Analyze behavioral patterns"
-
Eliminate Example Scenarios
- Keep only output format specification
-
Remove narrative examples that duplicate the schema
-
Remove Meta-Instructions
- Cut instructions like "Before making any decision, work through this thinking process"
- These are implicit in the task definition
Rejected Alternatives
- Shared Context File: Would require code changes to load shared context; decided to keep inline for simplicity
- Tool Examples in Prompts: Skipped since tool selection is straightforward with clear naming
- Aggressive 20K Target: Would risk degrading agent decision quality
Results
Token Reduction
| Agent | Before | After | Reduction |
|---|---|---|---|
| Intake | 12,575 | 4,400 | 65% |
| Credit | 26,998 | 8,792 | 67% |
| Income | 27,330 | 10,245 | 63% |
| Risk | 16,964 | 17,525 | +3%* |
| Total | 83,867 | 40,962 | 51% |
*Risk agent slight increase due to accumulated context from upstream agents.
Character Reduction
| Agent | Before | After | Reduction |
|---|---|---|---|
| intake-agent | 12,194 | 2,255 | 82% |
| income-agent | 12,854 | 2,441 | 81% |
| credit-agent | 15,484 | 2,783 | 82% |
| risk-agent | 21,962 | 3,426 | 84% |
| Total | 62,494 | 10,905 | 83% |
Cost Impact (1,000 applications/day)
| Metric | Before | After | Savings |
|---|---|---|---|
| Daily Cost | ~$310 | ~$202 | $108/day |
| Monthly Cost | ~$9,300 | ~$6,060 | $3,240/month |
Consequences
Positive
- 51% reduction in input tokens per application
- ~35% reduction in API costs
- Faster response times (smaller context = faster processing)
- Cleaner, more maintainable persona files
- Agent decision quality maintained
Negative
- Less verbose documentation in persona files (mitigated by external docs)
- Requires careful testing when making persona changes
Risks & Mitigations
| Risk | Mitigation |
|---|---|
| Quality degradation | Tested loan application flow; verified decisions unchanged |
| Future maintainability | Document optimization techniques in this ADR |
| Over-compression | Set floor at ~2,000 chars per persona to preserve effectiveness |
Implementation
Files Modified
apps/api/loan_defenders/agents/agent-persona/intake-agent-persona.mdapps/api/loan_defenders/agents/agent-persona/credit-agent-persona.mdapps/api/loan_defenders/agents/agent-persona/income-agent-persona.mdapps/api/loan_defenders/agents/agent-persona/risk-agent-persona.md
Verification Process
- Run loan application through UI
- Check token metrics in logs (
ENABLE_AGENT_FRAMEWORK_OBSERVABILITY=true) - Verify agent decisions are reasonable and consistent
- Compare before/after token counts
Future Optimization Opportunities
- Prompt Caching: Anthropic's caching can reduce input costs by 90% for static content
- GPT-4o-mini: Switch to smaller model for simple agents (Intake) - $0.15 vs $2.50 per 1M tokens
- Batch Processing: Non-real-time use cases can use batch API at 50% discount
- Further Compression: Target 25K tokens with more aggressive techniques
References
- Anthropic Token Saving Updates
- Portkey: Optimize Token Efficiency
- OpenAI Prompt Engineering Best Practices
- PR #198: Implementation
Status: Accepted and Implemented Implementation Date: 2025-11-28 Next Review: 2026-02-28