Skip to content

Azure MCP Servers Deployment Configuration

Related: ADR-039: MCP Servers Container Apps Deployment
Status: Implementation Guide
Updated: 2025-01-15


Overview

This document defines deployment configurations for MCP servers across environments: - Development: Cost-optimized, lower scale, full security - Production: Performance-optimized, auto-scaling, production-ready (documentation only for MVP)


Environment Comparison Matrix

Aspect Development (Implementing) Production (Documentation)
Purpose Testing, validation Live customer traffic
Scale Low (1-2 replicas) High (2-5 replicas)
Performance Acceptable (100-200ms) Optimized (<50ms)
Cost Optimized (~$40-55/mo) Higher (~$100-150/mo)
Security ✅ Full (VNet, ingress) ✅ Full + auth
Availability 99.9% (acceptable) 99.95%+ (SLA required)
Monitoring Standard Enhanced + alerting

Development Environment Configuration

Design Principles

Cost Optimization: - Minimize replicas while maintaining reliability - Right-size resources (no over-provisioning) - Single environment (no staging for MVP)

Security Maintained: - ✅ VNet isolation (full production security) - ✅ Internal ingress (no external access) - ✅ NSG rules (production-grade) - ✅ TLS encryption (Azure-managed) - ✅ Managed identities (no credentials)

Performance Acceptable: - Target: 100-200ms response time - Cold start: None (min 1 replica) - Throughput: Sufficient for 50-100 apps/day

MCP Server Configuration (Dev)

Resource Allocation

Per MCP Server:

resources: {
  cpu: json('0.5')        // 0.5 vCPU (minimum viable)
  memory: '1Gi'           // 1GB RAM (sufficient for MVP)
}

Rationale: - 0.5 vCPU handles ~100 requests/second - 1GB memory sufficient for Python + FastMCP - Validated in Docker Compose testing (PR #155)

Scaling Configuration

Development Scaling:

scale: {
  minReplicas: 1          // Always 1 running (no cold start)
  maxReplicas: 2          // Max 2 for dev (cost control)
  rules: [
    {
      name: 'http-scaling'
      http: {
        metadata: {
          concurrentRequests: '100'  // Conservative for dev
        }
      }
    }
  ]
}

Rationale: - Min 1: Avoid cold starts (critical even in dev) - Max 2: Sufficient for dev traffic, controls cost - Trigger: 100 concurrent requests (unlikely in dev)

Health Probes

Liveness Probe (restart unhealthy):

{
  type: 'liveness'
  httpGet: {
    path: '/health'
    port: 8010              // Or 8011, 8012
    scheme: 'HTTP'
  }
  initialDelaySeconds: 40   // Python + FastMCP startup
  periodSeconds: 30         // Check every 30s
  timeoutSeconds: 10        // Allow 10s for response
  failureThreshold: 3       // Restart after 3 failures
}

Readiness Probe (remove from load balancer):

{
  type: 'readiness'
  httpGet: {
    path: '/health'
    port: 8010
    scheme: 'HTTP'
  }
  initialDelaySeconds: 10   // Faster than liveness
  periodSeconds: 10         // Check more frequently
  timeoutSeconds: 5         // Shorter timeout
  failureThreshold: 2       // Remove quickly
}

Ingress Configuration

Internal Only:

ingress: {
  external: false           // ✅ CRITICAL: No external access
  targetPort: 8010          // Service port (8010, 8011, 8012)
  transport: 'http'         // Internal HTTP (Azure adds TLS)
  allowInsecure: false      // Enforce HTTPS
  traffic: [
    {
      weight: 100
      latestRevision: true
    }
  ]
}

Environment Variables

Development:

env: [
  {
    name: 'MCP_SERVER_PORT'
    value: '8010'           // Or 8011, 8012
  }
  {
    name: 'APP_LOG_LEVEL'
    value: 'INFO'           // More verbose in dev
  }
  {
    name: 'PYTHONPATH'
    value: '/app'
  }
  {
    name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
    secretRef: 'appinsights-connection-string'
  }
  {
    name: 'ENVIRONMENT'
    value: 'development'
  }
]

Cost Projection (Development)

Per MCP Server (Dev)

Resource Costs:

CPU: 0.5 vCPU × 1 replica × 730 hours/month × $0.0864/hour
   = $31.54/month

Memory: 1 GB × 1 replica × 730 hours/month × $0.0108/hour
      = $7.88/month

Total per server: ~$39.42/month

3 MCP Servers:

Application Verification: $39.42/month
Document Processing:      $39.42/month
Financial Calculations:   $39.42/month

Total MCP Servers: ~$118/month
With Azure discounts: ~$40-55/month ✅

Total Development Environment

Full Stack (Dev):

Component                     Monthly Cost
────────────────────────────────────────
Core Infrastructure
- VNet + Subnets             $0 (included)
- NSG                        $0 (included)
- Container Apps Environment $0 (included)

Container Apps
- UI (1 replica)             ~$13-18
- API (1 replica)            ~$13-18
- MCP Servers (3 × 1 replica) ~$40-55

Azure Services
- Application Insights       ~$10-15
- Log Analytics              ~$5-10
- Azure OpenAI               ~$20-30 (usage)

VPN Gateway (Optional)
- VpnGw2                     ~$140/mo
────────────────────────────────────────
Total without VPN:           ~$101-146/month ✅
Total with VPN:              ~$241-286/month

Optimization Notes: - Can disable VPN after initial setup: -$140/month - OpenAI cost scales with usage (low in dev) - Application Insights has free tier (5GB/month)

Performance Expectations (Development)

Capacity

Per MCP Server (0.5 vCPU, 1 replica): - Throughput: ~100 requests/second - Concurrent: ~50 requests - Daily capacity: ~8.6M requests/day

Development Load: - 10-50 loan applications/day - ~1,000-3,000 MCP calls/day - ~1-5 requests/minute average - Headroom: 8,600x (massive buffer)

Latency

Expected Response Times (Dev):

Scenario                    Target      Acceptable
──────────────────────────────────────────────────
Simple calculation          <50ms       <100ms
Credit check                <100ms      <200ms
Document processing         <500ms      <1000ms
Complex workflow            <1000ms     <2000ms

Dev vs. Production: - Dev may be 10-20% slower (lower resources) - Still well within acceptable range - No impact on MVP validation


Production Environment Configuration (Documentation)

Note: This is documentation only. Production deployment not implemented for MVP.

Design Principles

Performance Optimized: - Higher resource allocation - More replicas (2-5) - Aggressive auto-scaling - Multi-region ready

High Availability: - Minimum 2 replicas (99.95% SLA) - Cross-zone distribution - Automatic failover - Zero-downtime deployments

Enhanced Security: - OAuth2/Entra ID authentication - Advanced monitoring and alerting - DDoS protection (Azure Front Door) - WAF rules (Azure Application Gateway)

MCP Server Configuration (Production)

Resource Allocation

Per MCP Server:

resources: {
  cpu: json('1.0')         // 1.0 vCPU (2x dev)
  memory: '2Gi'            // 2GB RAM (2x dev)
}

Rationale: - 1.0 vCPU: Handles high concurrent load - 2GB RAM: Buffer for peak traffic - Faster response times (<50ms p95)

Scaling Configuration

Production Scaling:

scale: {
  minReplicas: 2           // HA: Always 2+ running
  maxReplicas: 10          // Can scale to 10 under load
  rules: [
    {
      name: 'http-scaling'
      http: {
        metadata: {
          concurrentRequests: '50'  // Aggressive scaling
        }
      }
    }
    {
      name: 'cpu-scaling'
      custom: {
        type: 'cpu'
        metadata: {
          type: 'Utilization'
          value: '70'      // Scale at 70% CPU
        }
      }
    }
  ]
}

Rationale: - Min 2: High availability (99.95% SLA) - Max 10: Handles traffic spikes - Dual triggers: HTTP requests + CPU utilization

Health Probes

Production (same as dev but more aggressive):

# Liveness
initialDelaySeconds: 30    // Faster than dev (optimized image)
periodSeconds: 15          // More frequent checks
failureThreshold: 2        // Faster recovery

# Readiness
initialDelaySeconds: 5     // Very fast
periodSeconds: 5           // Very frequent
failureThreshold: 1        // Remove immediately

Ingress Configuration

Internal with OAuth2:

ingress: {
  external: false
  targetPort: 8010
  transport: 'http'
  allowInsecure: false

  # Future: Custom domains
  customDomains: [
    {
      name: 'mcp-verification.loandefenders.com'
      certificateId: customCertId
    }
  ]
}

Environment Variables

Production:

env: [
  {
    name: 'MCP_SERVER_PORT'
    value: '8010'
  }
  {
    name: 'APP_LOG_LEVEL'
    value: 'WARNING'        // Less verbose in production
  }
  {
    name: 'PYTHONPATH'
    value: '/app'
  }
  {
    name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
    secretRef: 'appinsights-connection-string'
  }
  {
    name: 'ENVIRONMENT'
    value: 'production'
  }
  {
    name: 'AUTH_ENABLED'
    value: 'true'           // OAuth2/Entra ID enabled
  }
  {
    name: 'AZURE_TENANT_ID'
    secretRef: 'tenant-id'
  }
  {
    name: 'AZURE_CLIENT_ID'
    secretRef: 'client-id'
  }
]

Cost Projection (Production)

Per MCP Server (Production)

Resource Costs:

CPU: 1.0 vCPU × 2.5 avg replicas × 730 hours × $0.0864/hour
   = $157.68/month

Memory: 2 GB × 2.5 avg replicas × 730 hours × $0.0108/hour
      = $39.42/month

Total per server: ~$197/month
With Azure discounts: ~$100-135/month

3 MCP Servers:

Application Verification: $100-135/month
Document Processing:      $100-135/month
Financial Calculations:   $100-135/month

Total MCP Servers: ~$300-405/month

Total Production Environment

Full Stack (Production):

Component                        Monthly Cost
──────────────────────────────────────────────
Core Infrastructure
- VNet + Subnets                 $0 (included)
- NSG                            $0 (included)
- Container Apps Environment     $0 (included)

Container Apps
- UI (2-3 replicas)              ~$40-60
- API (2-3 replicas)             ~$40-60
- MCP Servers (3 × 2-3 replicas) ~$300-405

Azure Services
- Application Insights           ~$30-50 (higher usage)
- Log Analytics                  ~$15-25
- Azure OpenAI                   ~$200-500 (production usage)
- Azure Front Door (CDN + WAF)   ~$100-150
- Application Gateway (WAF)      ~$140-180

Optional Enhancements
- Redis Cache (Premium)          ~$100-200
- Cosmos DB (backup)             ~$50-100
- Azure Monitor Alerts           ~$10-20
──────────────────────────────────────────────
Total Base:                      ~$725-1,300/month
Total with Enhancements:         ~$885-1,620/month

Note: Production costs 7-10x higher than dev (scale + features)

Performance Expectations (Production)

Capacity

Per MCP Server (1.0 vCPU, 2-3 replicas): - Throughput: ~300-450 requests/second - Concurrent: ~150-225 requests - Daily capacity: ~25-40M requests/day

Production Load Projections:

Time Period     Daily Apps    MCP Calls/Day    Replicas Needed
──────────────────────────────────────────────────────────────
Launch (0-3mo)  100-200       6K-12K           2-3
Growth (3-6mo)  200-500       12K-30K          2-4
Scale (6-12mo)  500-1000      30K-60K          3-5
Enterprise      1000-5000     60K-300K         5-10

Latency

Expected Response Times (Production):

Scenario                    p50      p95      p99
───────────────────────────────────────────────────
Simple calculation          <30ms    <50ms    <100ms
Credit check                <50ms    <100ms   <200ms
Document processing         <200ms   <500ms   <1000ms
Complex workflow            <500ms   <1000ms  <2000ms

SLA Targets: - Availability: 99.95% (max 4.38 hours downtime/year) - Latency: <100ms (p95) for all MCP calls - Throughput: Handle 10x traffic spike


Security Configuration (Both Environments)

Network Security

VNet Isolation:

VNet: 10.0.0.0/16
- Container Apps Subnet: 10.0.0.0/23
- Private Endpoints Subnet: 10.0.3.0/24

NSG Rules (Container Apps Subnet):

Inbound:
- Allow: Azure Load Balancer → * (TCP/443)
- Allow: VNet → VNet (All)
- Deny: * → * (All)

Outbound:
- Allow: VNet → VNet (All)
- Allow: VNet → Internet (TCP/443)
- Allow: VNet → AzureCloud (All)
- Deny: * → * (All)

Result: - ✅ MCP servers accessible only within VNet - ✅ External traffic blocked - ✅ Azure services accessible (ACR, monitoring)

Authentication

Development: - No OAuth2 (network security only) - Managed Identity for Azure services - Fast implementation (5 days)

Production: - OAuth2/Entra ID required - Token-based authorization - Per-service access control - Full audit trail

Managed Identity Permissions

Required for All Environments:

MCP Servers → ACR:
- Role: AcrPull
- Scope: Azure Container Registry

MCP Servers → Monitoring:
- Role: Monitoring Metrics Publisher
- Scope: Log Analytics Workspace

MCP Servers → Application Insights:
- Connection String (via secret)


Monitoring Configuration

Development Monitoring

Application Insights: - Standard tier (5GB/month free) - Request tracking - Exception logging - Dependency monitoring

Log Analytics: - Basic tier - 30-day retention - Container logs - Health check logs

Alerts: - Critical: Service down (5 minutes) - Warning: High error rate (>10%) - Info: Scaling events

Production Monitoring (Enhanced)

Application Insights: - Premium tier - Live Metrics Stream - Smart Detection - Availability tests - Custom dashboards

Log Analytics: - Standard tier - 90-day retention - Advanced queries - Workbook templates

Alerts: - Critical: Service down (1 minute) - Warning: High error rate (>5%) - Warning: High latency (p95 >100ms) - Warning: High cost (>budget threshold) - Info: Scaling events - Info: Health check failures

Dashboards: - Real-time metrics - SLA compliance - Cost tracking - Error analysis


Deployment Commands

Development Deployment

Deploy Container Platform with MCP Servers:

# Deploy core infrastructure + MCP servers
az deployment group create \
  --resource-group ldfdev-rg \
  --template-file infrastructure/bicep/modules/container-platform.bicep \
  --parameters infrastructure/bicep/environments/dev-container-platform.parameters.json

# Verify deployment
az containerapp list \
  --resource-group ldfdev-rg \
  --query "[?contains(name, 'mcp')].{Name:name, Status:properties.runningStatus}" \
  --output table

# Check health
az containerapp show \
  --name ldfdev-mcp-verification \
  --resource-group ldfdev-rg \
  --query "properties.{Status:runningStatus, Replicas:template.scale}" \
  --output json

Expected Output:

Name                         Status
---------------------------  --------
ldfdev-mcp-verification      Running
ldfdev-mcp-documents         Running
ldfdev-mcp-financial         Running

Production Deployment (Future)

Prerequisites: 1. ✅ OAuth2/Entra ID configured 2. ✅ Custom domains registered 3. ✅ SSL certificates provisioned 4. ✅ Azure Front Door configured 5. ✅ Load testing completed

Deploy:

# Deploy with production parameters
az deployment group create \
  --resource-group ldfprod-rg \
  --template-file infrastructure/bicep/modules/container-platform.bicep \
  --parameters infrastructure/bicep/environments/prod-container-platform.parameters.json

# Smoke test
./scripts/smoke-test-production.sh

# Enable traffic
az containerapp ingress traffic set \
  --name ldfprod-mcp-verification \
  --resource-group ldfprod-rg \
  --traffic-weight latest=100


Testing Strategy

Development Testing

Health Checks:

# From within VNet (via VPN or Azure Bastion)
curl https://ldfdev-mcp-verification.internal.{domain}/health
# Expected: {"status": "healthy", "timestamp": "..."}

# From internet (should FAIL)
curl https://ldfdev-mcp-verification.azurecontainerapps.io/health
# Expected: Connection refused or 403 Forbidden

Load Testing:

# Simple load test (100 requests)
for i in {1..100}; do
  curl -X POST https://ldfdev-api.internal.{domain}/mcp/verify-identity \
    -H "Content-Type: application/json" \
    -d '{"applicant_id": "test-123"}' &
done
wait

# Check auto-scaling
az containerapp revision list \
  --name ldfdev-mcp-verification \
  --resource-group ldfdev-rg \
  --query "[].{Name:name, Replicas:properties.replicas, Active:properties.active}"

Production Testing (Future)

Staged Rollout: 1. Deploy to 10% traffic 2. Monitor for 1 hour 3. Increase to 50% if healthy 4. Monitor for 6 hours 5. Full rollout if metrics good

Load Testing: - Use Azure Load Testing service - Simulate 1000 concurrent users - Test for 1 hour sustained load - Validate auto-scaling behavior - Verify latency SLAs


Rollback Procedures

Development

Quick Rollback:

# Revert to previous revision
az containerapp revision copy \
  --name ldfdev-mcp-verification \
  --resource-group ldfdev-rg \
  --from-revision previous-revision-name

# Activate previous revision
az containerapp ingress traffic set \
  --name ldfdev-mcp-verification \
  --resource-group ldfdev-rg \
  --traffic-weight previous-revision=100 latest=0

Nuclear Option:

# Delete and redeploy
az containerapp delete --name ldfdev-mcp-verification -g ldfdev-rg --yes
# Then redeploy from Bicep

Production (Future)

Staged Rollback: 1. Immediate: Route 100% traffic to previous revision 2. Investigate: Review logs and metrics 3. Fix Forward: Deploy hotfix if possible 4. Full Rollback: Redeploy previous version if needed


Configuration Summary

Quick Reference

Configuration Development Production
vCPU per server 0.5 1.0
Memory per server 1GB 2GB
Min replicas 1 2
Max replicas 2 10
Cost (3 servers) $40-55/mo $300-405/mo
Auth None OAuth2/Entra ID
Monitoring Standard Enhanced
SLA 99.9% 99.95%
Implement ✅ Yes (MVP) 📄 Docs only

Next Steps

Immediate (Development)

  1. Create Bicep modules for 3 MCP servers
  2. Update container-platform.bicep to include MCP servers
  3. Deploy to dev environment and validate
  4. Test internal connectivity from API
  5. Verify external access blocked
  6. Monitor costs and adjust if needed

Future (Production)

  1. Implement OAuth2/Entra ID authentication
  2. Set up Azure Front Door with WAF
  3. Configure custom domains and SSL
  4. Deploy with blue-green strategy
  5. Complete load testing and tuning
  6. Enable enhanced monitoring
  7. Document production runbooks

References


Configuration Status: ✅ Dev ready to implement, Prod documented
Cost Estimate: Dev $40-55/mo, Prod $725-1,300/mo
Timeline: Dev deployment 5 days, Prod planning complete