Skip to content

ADR-060: Agent Persona Token Optimization

Status: Accepted Date: 2025-11-28 Deciders: Engineering Team


Context

The multi-agent loan processing system was consuming ~84,000 input tokens per loan application across 4 agents: - Intake Agent: 12,575 tokens - Credit Agent: 26,998 tokens - Income Agent: 27,330 tokens - Risk Agent: 16,964 tokens

Problems: - High API costs (~$310/day at 1,000 applications) - Slow response times due to large context windows - Verbose personas with redundant content - Example scenarios duplicating output format specifications

Goal: Reduce token consumption by 50%+ while maintaining agent decision quality.


Decision

Apply industry best practices from Anthropic and OpenAI for prompt optimization:

Optimization Techniques

  1. Remove Redundant Content
  2. Eliminate verbose explanations of concepts agents inherently understand
  3. Remove multiple restatements of role/responsibilities
  4. Consolidate repeated compliance/security sections

  5. Convert Tables to Compact Rules

  6. Before: Multi-column markdown tables with explanations
  7. After: Single-line rules (e.g., Income/Loan: ≥4x=Excellent, 3-4x=Good, 2-3x=Fair)

  8. Use Imperative Commands

  9. Before: "You should analyze the application data to identify behavioral patterns"
  10. After: "Analyze behavioral patterns"

  11. Eliminate Example Scenarios

  12. Keep only output format specification
  13. Remove narrative examples that duplicate the schema

  14. Remove Meta-Instructions

  15. Cut instructions like "Before making any decision, work through this thinking process"
  16. These are implicit in the task definition

Rejected Alternatives

  1. Shared Context File: Would require code changes to load shared context; decided to keep inline for simplicity
  2. Tool Examples in Prompts: Skipped since tool selection is straightforward with clear naming
  3. Aggressive 20K Target: Would risk degrading agent decision quality

Results

Token Reduction

Agent Before After Reduction
Intake 12,575 4,400 65%
Credit 26,998 8,792 67%
Income 27,330 10,245 63%
Risk 16,964 17,525 +3%*
Total 83,867 40,962 51%

*Risk agent slight increase due to accumulated context from upstream agents.

Character Reduction

Agent Before After Reduction
intake-agent 12,194 2,255 82%
income-agent 12,854 2,441 81%
credit-agent 15,484 2,783 82%
risk-agent 21,962 3,426 84%
Total 62,494 10,905 83%

Cost Impact (1,000 applications/day)

Metric Before After Savings
Daily Cost ~$310 ~$202 $108/day
Monthly Cost ~$9,300 ~$6,060 $3,240/month

Consequences

Positive

  • 51% reduction in input tokens per application
  • ~35% reduction in API costs
  • Faster response times (smaller context = faster processing)
  • Cleaner, more maintainable persona files
  • Agent decision quality maintained

Negative

  • Less verbose documentation in persona files (mitigated by external docs)
  • Requires careful testing when making persona changes

Risks & Mitigations

Risk Mitigation
Quality degradation Tested loan application flow; verified decisions unchanged
Future maintainability Document optimization techniques in this ADR
Over-compression Set floor at ~2,000 chars per persona to preserve effectiveness

Implementation

Files Modified

  • apps/api/loan_defenders/agents/agent-persona/intake-agent-persona.md
  • apps/api/loan_defenders/agents/agent-persona/credit-agent-persona.md
  • apps/api/loan_defenders/agents/agent-persona/income-agent-persona.md
  • apps/api/loan_defenders/agents/agent-persona/risk-agent-persona.md

Verification Process

  1. Run loan application through UI
  2. Check token metrics in logs (ENABLE_AGENT_FRAMEWORK_OBSERVABILITY=true)
  3. Verify agent decisions are reasonable and consistent
  4. Compare before/after token counts

Future Optimization Opportunities

  1. Prompt Caching: Anthropic's caching can reduce input costs by 90% for static content
  2. GPT-4o-mini: Switch to smaller model for simple agents (Intake) - $0.15 vs $2.50 per 1M tokens
  3. Batch Processing: Non-real-time use cases can use batch API at 50% discount
  4. Further Compression: Target 25K tokens with more aggressive techniques

References


Status: Accepted and Implemented Implementation Date: 2025-11-28 Next Review: 2026-02-28