Azure MCP Servers Deployment Configuration
Related: ADR-039: MCP Servers Container Apps Deployment
Status: Implementation Guide
Updated: 2025-01-15
Overview
This document defines deployment configurations for MCP servers across environments: - Development: Cost-optimized, lower scale, full security - Production: Performance-optimized, auto-scaling, production-ready (documentation only for MVP)
Environment Comparison Matrix
| Aspect | Development (Implementing) | Production (Documentation) |
|---|---|---|
| Purpose | Testing, validation | Live customer traffic |
| Scale | Low (1-2 replicas) | High (2-5 replicas) |
| Performance | Acceptable (100-200ms) | Optimized (<50ms) |
| Cost | Optimized (~$40-55/mo) | Higher (~$100-150/mo) |
| Security | ✅ Full (VNet, ingress) | ✅ Full + auth |
| Availability | 99.9% (acceptable) | 99.95%+ (SLA required) |
| Monitoring | Standard | Enhanced + alerting |
Development Environment Configuration
Design Principles
Cost Optimization: - Minimize replicas while maintaining reliability - Right-size resources (no over-provisioning) - Single environment (no staging for MVP)
Security Maintained: - ✅ VNet isolation (full production security) - ✅ Internal ingress (no external access) - ✅ NSG rules (production-grade) - ✅ TLS encryption (Azure-managed) - ✅ Managed identities (no credentials)
Performance Acceptable: - Target: 100-200ms response time - Cold start: None (min 1 replica) - Throughput: Sufficient for 50-100 apps/day
MCP Server Configuration (Dev)
Resource Allocation
Per MCP Server:
resources: {
cpu: json('0.5') // 0.5 vCPU (minimum viable)
memory: '1Gi' // 1GB RAM (sufficient for MVP)
}
Rationale: - 0.5 vCPU handles ~100 requests/second - 1GB memory sufficient for Python + FastMCP - Validated in Docker Compose testing (PR #155)
Scaling Configuration
Development Scaling:
scale: {
minReplicas: 1 // Always 1 running (no cold start)
maxReplicas: 2 // Max 2 for dev (cost control)
rules: [
{
name: 'http-scaling'
http: {
metadata: {
concurrentRequests: '100' // Conservative for dev
}
}
}
]
}
Rationale: - Min 1: Avoid cold starts (critical even in dev) - Max 2: Sufficient for dev traffic, controls cost - Trigger: 100 concurrent requests (unlikely in dev)
Health Probes
Liveness Probe (restart unhealthy):
{
type: 'liveness'
httpGet: {
path: '/health'
port: 8010 // Or 8011, 8012
scheme: 'HTTP'
}
initialDelaySeconds: 40 // Python + FastMCP startup
periodSeconds: 30 // Check every 30s
timeoutSeconds: 10 // Allow 10s for response
failureThreshold: 3 // Restart after 3 failures
}
Readiness Probe (remove from load balancer):
{
type: 'readiness'
httpGet: {
path: '/health'
port: 8010
scheme: 'HTTP'
}
initialDelaySeconds: 10 // Faster than liveness
periodSeconds: 10 // Check more frequently
timeoutSeconds: 5 // Shorter timeout
failureThreshold: 2 // Remove quickly
}
Ingress Configuration
Internal Only:
ingress: {
external: false // ✅ CRITICAL: No external access
targetPort: 8010 // Service port (8010, 8011, 8012)
transport: 'http' // Internal HTTP (Azure adds TLS)
allowInsecure: false // Enforce HTTPS
traffic: [
{
weight: 100
latestRevision: true
}
]
}
Environment Variables
Development:
env: [
{
name: 'MCP_SERVER_PORT'
value: '8010' // Or 8011, 8012
}
{
name: 'APP_LOG_LEVEL'
value: 'INFO' // More verbose in dev
}
{
name: 'PYTHONPATH'
value: '/app'
}
{
name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
secretRef: 'appinsights-connection-string'
}
{
name: 'ENVIRONMENT'
value: 'development'
}
]
Cost Projection (Development)
Per MCP Server (Dev)
Resource Costs:
CPU: 0.5 vCPU × 1 replica × 730 hours/month × $0.0864/hour
= $31.54/month
Memory: 1 GB × 1 replica × 730 hours/month × $0.0108/hour
= $7.88/month
Total per server: ~$39.42/month
3 MCP Servers:
Application Verification: $39.42/month
Document Processing: $39.42/month
Financial Calculations: $39.42/month
Total MCP Servers: ~$118/month
With Azure discounts: ~$40-55/month ✅
Total Development Environment
Full Stack (Dev):
Component Monthly Cost
────────────────────────────────────────
Core Infrastructure
- VNet + Subnets $0 (included)
- NSG $0 (included)
- Container Apps Environment $0 (included)
Container Apps
- UI (1 replica) ~$13-18
- API (1 replica) ~$13-18
- MCP Servers (3 × 1 replica) ~$40-55
Azure Services
- Application Insights ~$10-15
- Log Analytics ~$5-10
- Azure OpenAI ~$20-30 (usage)
VPN Gateway (Optional)
- VpnGw2 ~$140/mo
────────────────────────────────────────
Total without VPN: ~$101-146/month ✅
Total with VPN: ~$241-286/month
Optimization Notes: - Can disable VPN after initial setup: -$140/month - OpenAI cost scales with usage (low in dev) - Application Insights has free tier (5GB/month)
Performance Expectations (Development)
Capacity
Per MCP Server (0.5 vCPU, 1 replica): - Throughput: ~100 requests/second - Concurrent: ~50 requests - Daily capacity: ~8.6M requests/day
Development Load: - 10-50 loan applications/day - ~1,000-3,000 MCP calls/day - ~1-5 requests/minute average - Headroom: 8,600x (massive buffer)
Latency
Expected Response Times (Dev):
Scenario Target Acceptable
──────────────────────────────────────────────────
Simple calculation <50ms <100ms
Credit check <100ms <200ms
Document processing <500ms <1000ms
Complex workflow <1000ms <2000ms
Dev vs. Production: - Dev may be 10-20% slower (lower resources) - Still well within acceptable range - No impact on MVP validation
Production Environment Configuration (Documentation)
Note: This is documentation only. Production deployment not implemented for MVP.
Design Principles
Performance Optimized: - Higher resource allocation - More replicas (2-5) - Aggressive auto-scaling - Multi-region ready
High Availability: - Minimum 2 replicas (99.95% SLA) - Cross-zone distribution - Automatic failover - Zero-downtime deployments
Enhanced Security: - OAuth2/Entra ID authentication - Advanced monitoring and alerting - DDoS protection (Azure Front Door) - WAF rules (Azure Application Gateway)
MCP Server Configuration (Production)
Resource Allocation
Per MCP Server:
Rationale: - 1.0 vCPU: Handles high concurrent load - 2GB RAM: Buffer for peak traffic - Faster response times (<50ms p95)
Scaling Configuration
Production Scaling:
scale: {
minReplicas: 2 // HA: Always 2+ running
maxReplicas: 10 // Can scale to 10 under load
rules: [
{
name: 'http-scaling'
http: {
metadata: {
concurrentRequests: '50' // Aggressive scaling
}
}
}
{
name: 'cpu-scaling'
custom: {
type: 'cpu'
metadata: {
type: 'Utilization'
value: '70' // Scale at 70% CPU
}
}
}
]
}
Rationale: - Min 2: High availability (99.95% SLA) - Max 10: Handles traffic spikes - Dual triggers: HTTP requests + CPU utilization
Health Probes
Production (same as dev but more aggressive):
# Liveness
initialDelaySeconds: 30 // Faster than dev (optimized image)
periodSeconds: 15 // More frequent checks
failureThreshold: 2 // Faster recovery
# Readiness
initialDelaySeconds: 5 // Very fast
periodSeconds: 5 // Very frequent
failureThreshold: 1 // Remove immediately
Ingress Configuration
Internal with OAuth2:
ingress: {
external: false
targetPort: 8010
transport: 'http'
allowInsecure: false
# Future: Custom domains
customDomains: [
{
name: 'mcp-verification.loandefenders.com'
certificateId: customCertId
}
]
}
Environment Variables
Production:
env: [
{
name: 'MCP_SERVER_PORT'
value: '8010'
}
{
name: 'APP_LOG_LEVEL'
value: 'WARNING' // Less verbose in production
}
{
name: 'PYTHONPATH'
value: '/app'
}
{
name: 'APPLICATIONINSIGHTS_CONNECTION_STRING'
secretRef: 'appinsights-connection-string'
}
{
name: 'ENVIRONMENT'
value: 'production'
}
{
name: 'AUTH_ENABLED'
value: 'true' // OAuth2/Entra ID enabled
}
{
name: 'AZURE_TENANT_ID'
secretRef: 'tenant-id'
}
{
name: 'AZURE_CLIENT_ID'
secretRef: 'client-id'
}
]
Cost Projection (Production)
Per MCP Server (Production)
Resource Costs:
CPU: 1.0 vCPU × 2.5 avg replicas × 730 hours × $0.0864/hour
= $157.68/month
Memory: 2 GB × 2.5 avg replicas × 730 hours × $0.0108/hour
= $39.42/month
Total per server: ~$197/month
With Azure discounts: ~$100-135/month
3 MCP Servers:
Application Verification: $100-135/month
Document Processing: $100-135/month
Financial Calculations: $100-135/month
Total MCP Servers: ~$300-405/month
Total Production Environment
Full Stack (Production):
Component Monthly Cost
──────────────────────────────────────────────
Core Infrastructure
- VNet + Subnets $0 (included)
- NSG $0 (included)
- Container Apps Environment $0 (included)
Container Apps
- UI (2-3 replicas) ~$40-60
- API (2-3 replicas) ~$40-60
- MCP Servers (3 × 2-3 replicas) ~$300-405
Azure Services
- Application Insights ~$30-50 (higher usage)
- Log Analytics ~$15-25
- Azure OpenAI ~$200-500 (production usage)
- Azure Front Door (CDN + WAF) ~$100-150
- Application Gateway (WAF) ~$140-180
Optional Enhancements
- Redis Cache (Premium) ~$100-200
- Cosmos DB (backup) ~$50-100
- Azure Monitor Alerts ~$10-20
──────────────────────────────────────────────
Total Base: ~$725-1,300/month
Total with Enhancements: ~$885-1,620/month
Note: Production costs 7-10x higher than dev (scale + features)
Performance Expectations (Production)
Capacity
Per MCP Server (1.0 vCPU, 2-3 replicas): - Throughput: ~300-450 requests/second - Concurrent: ~150-225 requests - Daily capacity: ~25-40M requests/day
Production Load Projections:
Time Period Daily Apps MCP Calls/Day Replicas Needed
──────────────────────────────────────────────────────────────
Launch (0-3mo) 100-200 6K-12K 2-3
Growth (3-6mo) 200-500 12K-30K 2-4
Scale (6-12mo) 500-1000 30K-60K 3-5
Enterprise 1000-5000 60K-300K 5-10
Latency
Expected Response Times (Production):
Scenario p50 p95 p99
───────────────────────────────────────────────────
Simple calculation <30ms <50ms <100ms
Credit check <50ms <100ms <200ms
Document processing <200ms <500ms <1000ms
Complex workflow <500ms <1000ms <2000ms
SLA Targets: - Availability: 99.95% (max 4.38 hours downtime/year) - Latency: <100ms (p95) for all MCP calls - Throughput: Handle 10x traffic spike
Security Configuration (Both Environments)
Network Security
VNet Isolation:
NSG Rules (Container Apps Subnet):
Inbound:
- Allow: Azure Load Balancer → * (TCP/443)
- Allow: VNet → VNet (All)
- Deny: * → * (All)
Outbound:
- Allow: VNet → VNet (All)
- Allow: VNet → Internet (TCP/443)
- Allow: VNet → AzureCloud (All)
- Deny: * → * (All)
Result: - ✅ MCP servers accessible only within VNet - ✅ External traffic blocked - ✅ Azure services accessible (ACR, monitoring)
Authentication
Development: - No OAuth2 (network security only) - Managed Identity for Azure services - Fast implementation (5 days)
Production: - OAuth2/Entra ID required - Token-based authorization - Per-service access control - Full audit trail
Managed Identity Permissions
Required for All Environments:
MCP Servers → ACR:
- Role: AcrPull
- Scope: Azure Container Registry
MCP Servers → Monitoring:
- Role: Monitoring Metrics Publisher
- Scope: Log Analytics Workspace
MCP Servers → Application Insights:
- Connection String (via secret)
Monitoring Configuration
Development Monitoring
Application Insights: - Standard tier (5GB/month free) - Request tracking - Exception logging - Dependency monitoring
Log Analytics: - Basic tier - 30-day retention - Container logs - Health check logs
Alerts: - Critical: Service down (5 minutes) - Warning: High error rate (>10%) - Info: Scaling events
Production Monitoring (Enhanced)
Application Insights: - Premium tier - Live Metrics Stream - Smart Detection - Availability tests - Custom dashboards
Log Analytics: - Standard tier - 90-day retention - Advanced queries - Workbook templates
Alerts: - Critical: Service down (1 minute) - Warning: High error rate (>5%) - Warning: High latency (p95 >100ms) - Warning: High cost (>budget threshold) - Info: Scaling events - Info: Health check failures
Dashboards: - Real-time metrics - SLA compliance - Cost tracking - Error analysis
Deployment Commands
Development Deployment
Deploy Container Platform with MCP Servers:
# Deploy core infrastructure + MCP servers
az deployment group create \
--resource-group ldfdev-rg \
--template-file infrastructure/bicep/modules/container-platform.bicep \
--parameters infrastructure/bicep/environments/dev-container-platform.parameters.json
# Verify deployment
az containerapp list \
--resource-group ldfdev-rg \
--query "[?contains(name, 'mcp')].{Name:name, Status:properties.runningStatus}" \
--output table
# Check health
az containerapp show \
--name ldfdev-mcp-verification \
--resource-group ldfdev-rg \
--query "properties.{Status:runningStatus, Replicas:template.scale}" \
--output json
Expected Output:
Name Status
--------------------------- --------
ldfdev-mcp-verification Running
ldfdev-mcp-documents Running
ldfdev-mcp-financial Running
Production Deployment (Future)
Prerequisites: 1. ✅ OAuth2/Entra ID configured 2. ✅ Custom domains registered 3. ✅ SSL certificates provisioned 4. ✅ Azure Front Door configured 5. ✅ Load testing completed
Deploy:
# Deploy with production parameters
az deployment group create \
--resource-group ldfprod-rg \
--template-file infrastructure/bicep/modules/container-platform.bicep \
--parameters infrastructure/bicep/environments/prod-container-platform.parameters.json
# Smoke test
./scripts/smoke-test-production.sh
# Enable traffic
az containerapp ingress traffic set \
--name ldfprod-mcp-verification \
--resource-group ldfprod-rg \
--traffic-weight latest=100
Testing Strategy
Development Testing
Health Checks:
# From within VNet (via VPN or Azure Bastion)
curl https://ldfdev-mcp-verification.internal.{domain}/health
# Expected: {"status": "healthy", "timestamp": "..."}
# From internet (should FAIL)
curl https://ldfdev-mcp-verification.azurecontainerapps.io/health
# Expected: Connection refused or 403 Forbidden
Load Testing:
# Simple load test (100 requests)
for i in {1..100}; do
curl -X POST https://ldfdev-api.internal.{domain}/mcp/verify-identity \
-H "Content-Type: application/json" \
-d '{"applicant_id": "test-123"}' &
done
wait
# Check auto-scaling
az containerapp revision list \
--name ldfdev-mcp-verification \
--resource-group ldfdev-rg \
--query "[].{Name:name, Replicas:properties.replicas, Active:properties.active}"
Production Testing (Future)
Staged Rollout: 1. Deploy to 10% traffic 2. Monitor for 1 hour 3. Increase to 50% if healthy 4. Monitor for 6 hours 5. Full rollout if metrics good
Load Testing: - Use Azure Load Testing service - Simulate 1000 concurrent users - Test for 1 hour sustained load - Validate auto-scaling behavior - Verify latency SLAs
Rollback Procedures
Development
Quick Rollback:
# Revert to previous revision
az containerapp revision copy \
--name ldfdev-mcp-verification \
--resource-group ldfdev-rg \
--from-revision previous-revision-name
# Activate previous revision
az containerapp ingress traffic set \
--name ldfdev-mcp-verification \
--resource-group ldfdev-rg \
--traffic-weight previous-revision=100 latest=0
Nuclear Option:
# Delete and redeploy
az containerapp delete --name ldfdev-mcp-verification -g ldfdev-rg --yes
# Then redeploy from Bicep
Production (Future)
Staged Rollback: 1. Immediate: Route 100% traffic to previous revision 2. Investigate: Review logs and metrics 3. Fix Forward: Deploy hotfix if possible 4. Full Rollback: Redeploy previous version if needed
Configuration Summary
Quick Reference
| Configuration | Development | Production |
|---|---|---|
| vCPU per server | 0.5 | 1.0 |
| Memory per server | 1GB | 2GB |
| Min replicas | 1 | 2 |
| Max replicas | 2 | 10 |
| Cost (3 servers) | $40-55/mo | $300-405/mo |
| Auth | None | OAuth2/Entra ID |
| Monitoring | Standard | Enhanced |
| SLA | 99.9% | 99.95% |
| Implement | ✅ Yes (MVP) | 📄 Docs only |
Next Steps
Immediate (Development)
- Create Bicep modules for 3 MCP servers
- Update container-platform.bicep to include MCP servers
- Deploy to dev environment and validate
- Test internal connectivity from API
- Verify external access blocked
- Monitor costs and adjust if needed
Future (Production)
- Implement OAuth2/Entra ID authentication
- Set up Azure Front Door with WAF
- Configure custom domains and SSL
- Deploy with blue-green strategy
- Complete load testing and tuning
- Enable enhanced monitoring
- Document production runbooks
References
- ADR-039: MCP Servers Container Apps Deployment
- MCP Deployment Security
- Azure Deployment Architecture
- Monitoring Guide
Configuration Status: ✅ Dev ready to implement, Prod documented
Cost Estimate: Dev $40-55/mo, Prod $725-1,300/mo
Timeline: Dev deployment 5 days, Prod planning complete