ADR-060: Agent Persona Token Optimization

Status: Accepted Date: 2025-11-28 Deciders: Engineering Team

Context

The multi-agent loan processing system was consuming ~84,000 input tokens per loan application across 4 agents: - Intake Agent: 12,575 tokens - Credit Agent: 26,998 tokens - Income Agent: 27,330 tokens - Risk Agent: 16,964 tokens

Problems: - High API costs (~$310/day at 1,000 applications) - Slow response times due to large context windows - Verbose personas with redundant content - Example scenarios duplicating output format specifications

Goal: Reduce token consumption by 50%+ while maintaining agent decision quality.

Decision

Apply industry best practices from Anthropic and OpenAI for prompt optimization:

Optimization Techniques

Remove Redundant Content
Eliminate verbose explanations of concepts agents inherently understand
Remove multiple restatements of role/responsibilities
Consolidate repeated compliance/security sections
Convert Tables to Compact Rules
Before: Multi-column markdown tables with explanations
After: Single-line rules (e.g., Income/Loan: ≥4x=Excellent, 3-4x=Good, 2-3x=Fair)
Use Imperative Commands
Before: "You should analyze the application data to identify behavioral patterns"
After: "Analyze behavioral patterns"
Eliminate Example Scenarios
Keep only output format specification
Remove narrative examples that duplicate the schema
Remove Meta-Instructions
Cut instructions like "Before making any decision, work through this thinking process"
These are implicit in the task definition

Rejected Alternatives

Shared Context File: Would require code changes to load shared context; decided to keep inline for simplicity
Tool Examples in Prompts: Skipped since tool selection is straightforward with clear naming
Aggressive 20K Target: Would risk degrading agent decision quality

Results

Token Reduction

Agent	Before	After	Reduction
Intake	12,575	4,400	65%
Credit	26,998	8,792	67%
Income	27,330	10,245	63%
Risk	16,964	17,525	+3%*
Total	83,867	40,962	51%

*Risk agent slight increase due to accumulated context from upstream agents.

Character Reduction

Agent	Before	After	Reduction
intake-agent	12,194	2,255	82%
income-agent	12,854	2,441	81%
credit-agent	15,484	2,783	82%
risk-agent	21,962	3,426	84%
Total	62,494	10,905	83%

Cost Impact (1,000 applications/day)

Metric	Before	After	Savings
Daily Cost	~$310	~$202	$108/day
Monthly Cost	~$9,300	~$6,060	$3,240/month

Consequences

Positive

51% reduction in input tokens per application
~35% reduction in API costs
Faster response times (smaller context = faster processing)
Cleaner, more maintainable persona files
Agent decision quality maintained

Negative

Less verbose documentation in persona files (mitigated by external docs)
Requires careful testing when making persona changes

Risks & Mitigations

Risk	Mitigation
Quality degradation	Tested loan application flow; verified decisions unchanged
Future maintainability	Document optimization techniques in this ADR
Over-compression	Set floor at ~2,000 chars per persona to preserve effectiveness

Implementation

Files Modified

apps/api/loan_defenders/agents/agent-persona/intake-agent-persona.md
apps/api/loan_defenders/agents/agent-persona/credit-agent-persona.md
apps/api/loan_defenders/agents/agent-persona/income-agent-persona.md
apps/api/loan_defenders/agents/agent-persona/risk-agent-persona.md

Verification Process

Run loan application through UI
Check token metrics in logs (ENABLE_AGENT_FRAMEWORK_OBSERVABILITY=true)
Verify agent decisions are reasonable and consistent
Compare before/after token counts

Future Optimization Opportunities

Prompt Caching: Anthropic's caching can reduce input costs by 90% for static content
GPT-4o-mini: Switch to smaller model for simple agents (Intake) - $0.15 vs $2.50 per 1M tokens
Batch Processing: Non-real-time use cases can use batch API at 50% discount
Further Compression: Target 25K tokens with more aggressive techniques

References

Anthropic Token Saving Updates
Portkey: Optimize Token Efficiency
OpenAI Prompt Engineering Best Practices
PR #198: Implementation

Status: Accepted and Implemented Implementation Date: 2025-11-28 Next Review: 2026-02-28