ADR-030: Three-Workflow Deployment Architecture
Status: Accepted Date: 2025-01-08 Deciders: Development Team Related: ADR-009, ADR-021, ADR-023
Context
Our initial deployment architecture used a single monolithic GitHub Actions workflow that deployed all infrastructure components in one execution:
Single Workflow (45-60 minutes):
→ Virtual Network + Subnets + NSGs
→ Key Vault + Storage Account
→ AI Services + AI Foundry
→ Log Analytics + Application Insights
→ Container Apps Environment
→ (Future) Container Apps for API/UI
Problems Identified
- Long deployment times: 45-60 minutes for full infrastructure (1+ hours with VPN)
- Tight coupling: Infrastructure and application deployments in single workflow
- Slow iteration: Every code change requires full infrastructure redeployment
- CI/CD bottleneck: Cannot deploy apps frequently without redeploying infrastructure
- Authentication timeouts: Container Apps deployment failed with
ClientAssertionCredentialtimeout after running 1+ hours - Poor separation of concerns: Infrastructure-level changes mixed with application deployments
User Requirements
From conversation on 2025-01-08:
"the current bicep and deploy infrastructure will deploy all environment specific items, now i am thinking the app and api should be separate from the infrastructure deployment"
Rationale: 1. Deploy applications more frequently than infrastructure 2. Better separation of concerns 3. Faster CD pipeline for application updates
Decision
We will split the monolithic deployment into 3 separate GitHub Actions workflows with clear boundaries and dependencies:
Workflow 1: Core Infrastructure
File: .github/workflows/deploy-infrastructure.yml
Frequency: Rarely (only when infrastructure configuration changes)
Duration: 5-10 minutes (45+ minutes if VPN enabled)
Deploys: - Virtual Network + Subnets + NSGs - Key Vault + Storage Account - Managed Identity - AI Services + AI Foundry (with private endpoints) - Log Analytics + Application Insights - Private DNS Zones - VPN Gateway (optional, controlled by input parameter)
Outputs (stored in Azure deployment history):
- containerAppsSubnetId
- logAnalyticsCustomerId
- logAnalyticsPrimarySharedKey
- Other resource IDs as needed
Workflow 2: Container Platform
File: .github/workflows/deploy-platform.yml
Frequency: Occasionally (when platform configuration changes)
Duration: 5-10 minutes
Deploys: - Azure Container Registry (Premium SKU, using AVM, with export policy enabled) - Container Apps Environment (External + VNet-integrated, using AVM)
Dependencies: Requires Workflow 1 completion
Discovery Method: Queries Azure deployment history for latest infrastructure-deployment-*
Outputs (stored in Azure deployment history):
- acrLoginServer
- acrName
- containerAppsEnvName
- containerAppsEnvDefaultDomain
- containerAppsEnvStaticIp
Workflow 3: Applications
File: .github/workflows/deploy-applications.yml
Frequency: Frequently (every code change to apps)
Duration: 3-5 minutes
Deploys: - API Container App (internal ingress, port 8000) - UI Container App (external ingress, port 8080)
Dependencies: Requires Workflows 1 & 2 completion
Discovery Method: Queries Azure deployment history for latest container-platform-*
Uses: Official azure/container-apps-deploy-action@v1 GitHub Action
- Builds Docker images from Dockerfiles
- Pushes to ACR
- Creates/updates Container Apps
- Manages revisions and traffic
Auto-trigger: Pushes to main branch that modify apps/api/** or apps/ui/**
Resource Discovery Strategy
Alternatives Considered
- ❌ Naming Convention:
- Use predictable resource names (
{prefix}-vnet,{prefix}-acr) -
Rejected: Fragile, breaks with custom deployments, no validation
-
✅ Deployment Outputs (SELECTED):
- Query Azure deployment history for outputs
- ~10 lines of PowerShell, 200-500ms per query
-
Selected: Simple, reliable, works across days/weeks
-
❌ Resource Tags:
- Tag resources with deployment metadata
- Query resources by tags
-
Rejected: More complex than deployment outputs, no clear benefit
-
❌ Multi-RG + Azure Resource Graph:
- Separate resource groups per layer
- Use Azure Resource Graph for cross-RG queries
- Rejected: Over-engineering for dev environment
- TODO: Document as production upgrade path
Implementation (Deployment Outputs)
Example: Workflow 2 discovering Workflow 1 outputs:
# Find latest core infrastructure deployment
$deployment = Get-AzResourceGroupDeployment `
-ResourceGroupName "ldfdev-rg" `
| Where-Object { $_.DeploymentName -like "infrastructure-deployment-*" } `
| Sort-Object Timestamp -Descending `
| Select-Object -First 1
if (-not $deployment) {
Write-Error "❌ No core infrastructure deployment found"
exit 1
}
# Extract required outputs
$containerAppsSubnetId = $deployment.Outputs.containerAppsSubnetId.Value
$logAnalyticsCustomerId = $deployment.Outputs.logAnalyticsCustomerId.Value
$logAnalyticsSharedKey = $deployment.Outputs.logAnalyticsPrimarySharedKey.Value
# Use in deployment
New-AzResourceGroupDeployment `
-Name "container-platform-$(Get-Date -Format 'yyyyMMdd-HHmmss')" `
-TemplateFile "infrastructure/bicep/modules/container-platform.bicep" `
-containerAppsSubnetId $containerAppsSubnetId `
-logAnalyticsCustomerId $logAnalyticsCustomerId `
-logAnalyticsPrimarySharedKey $sharedKeySecure
Why This Works: - ✅ Simple: ~10 lines of PowerShell per workflow - ✅ Fast: 200-500ms per query - ✅ Reliable: Uses Azure's built-in deployment history - ✅ Temporal decoupling: Works across days/weeks - ✅ Self-documenting: Deployment names clearly indicate purpose - ✅ Validation: Can check for missing outputs and fail early
Azure Verified Modules Usage
Workflow 2 uses official Azure Verified Modules (AVM) for container platform resources:
// Azure Container Registry (AVM)
module containerRegistry 'br/public:avm/res/container-registry/registry:0.5.0' = {
name: 'acr-deployment'
params: {
name: acrName
location: location
acrSku: 'Premium'
publicNetworkAccess: 'Enabled' // TODO: Private endpoint in production
acrAdminUserEnabled: false // Use managed identity
tags: tags
}
}
// Container Apps Environment (AVM)
module containerAppsEnv 'br/public:avm/res/app/managed-environment:0.8.0' = {
name: 'container-apps-env-deployment'
params: {
name: containerAppsEnvName
location: location
infrastructureSubnetResourceId: containerAppsSubnetId
internal: true
logAnalyticsConfiguration: {
customerId: logAnalyticsCustomerId
sharedKey: logAnalyticsPrimarySharedKey
}
tags: tags
}
}
See: ADR-021: Azure Verified Modules Adoption
Deployment Flow
First-Time Deployment
# 1. Deploy core infrastructure (5-10 minutes)
gh workflow run deploy-infrastructure.yml \
--field environment=dev \
--field deploy_vpn_gateway=false
# Wait for completion
# 2. Deploy container platform (5-10 minutes, faster with external environment)
gh workflow run deploy-platform.yml \
--field environment=dev
# Wait for completion
# 3. Deploy applications (3-5 minutes)
gh workflow run deploy-applications.yml \
--field environment=dev
Total: ~15-30 minutes for complete environment
Application Updates (Most Common)
# Option 1: Automatic (on git push to main)
git add apps/api/ apps/ui/
git commit -m "feat: update API and UI"
git push origin main
# Workflow 3 triggers automatically (3-5 minutes)
# Option 2: Manual
gh workflow run deploy-applications.yml --field environment=dev
Duration: 3-5 minutes (vs 45-60 minutes with monolithic workflow)
Consequences
Positive
- ✅ Faster CI/CD: Application deployments reduced from 45-60 min to 3-5 min (90% improvement)
- ✅ Separation of concerns: Clear boundaries between infrastructure, platform, and applications
- ✅ Flexible deployment cadence:
- Infrastructure: Rarely (weeks/months)
- Platform: Occasionally (weeks)
- Applications: Frequently (multiple times per day)
- ✅ Temporal decoupling: Workflows can run days/weeks apart without issues
- ✅ Reduced risk: Smaller, focused deployments easier to test and rollback
- ✅ Better developer experience: Faster iteration cycles for application development
- ✅ Cost optimization: Only redeploy what changed
- ✅ Parallel development: Multiple teams can work on different layers independently
Negative
- ⚠️ More workflows to manage: 3 workflows vs 1 (mitigated by clear documentation)
- ⚠️ Dependency tracking: Must ensure Workflow 1 → 2 → 3 order (mitigated by dependency checks)
- ⚠️ Single resource group: All resources in one RG for dev (acceptable for dev, upgrade path documented)
- ⚠️ Learning curve: Developers must understand 3-workflow architecture (mitigated by comprehensive docs)
Mitigation Strategies
For Dependency Tracking: - Each workflow validates dependencies before deployment - Clear error messages with actionable instructions - Deployment history queries fail early if dependencies missing
Example Error Message:
❌ DEPENDENCY NOT FOUND
No core infrastructure deployment found in resource group: ldfdev-rg
ACTION REQUIRED:
1. Deploy core infrastructure: Run 'Deploy Infrastructure' workflow
2. Verify the workflow completed successfully
3. Retry this workflow
Deployment blocked until core infrastructure is available.
For Single Resource Group Limitation: - Document production upgrade path to multi-RG architecture - Create TODO in documentation for future implementation - Acceptable tradeoff for dev environment simplicity
Production Upgrade Path (TODO)
For production environments, consider upgrading to:
- Multi-Resource Groups:
{env}-core-rg: Networking, Key Vault, AI Services{env}-platform-rg: ACR, Container Apps Environment-
{env}-apps-rg: Container Apps (API/UI) -
Azure Resource Graph:
- Cross-RG queries using KQL
- Faster than deployment output queries at scale
-
Example:
-
Azure Policy:
- Enforce tagging standards
- Validate resource naming conventions
-
Audit compliance
-
Resource Locks:
- Prevent accidental deletion of infrastructure resources
- Read-only locks on Workflow 1 resources
- Delete locks on critical resources (VNet, Key Vault)
Documentation: To be created in docs/deployment/production-upgrade-guide.md
Implementation
Files Created
- Workflows:
.github/workflows/deploy-infrastructure.yml(MODIFIED - removed Container Apps).github/workflows/deploy-container-platform.yml(NEW)-
.github/workflows/deploy-applications.yml(NEW) -
Bicep Modules:
-
infrastructure/bicep/modules/container-platform.bicep(NEW - uses AVM) -
Parameters:
-
infrastructure/bicep/environments/dev-container-platform.parameters.json(NEW) -
Documentation:
docs/deployment/three-workflow-deployment.md(NEW - comprehensive guide)- This ADR
Changes to Existing Workflows
deploy-infrastructure.yml:
- Removed Container Apps Environment deployment (previously Step 3.4)
- Updated deployment summaries to reference next workflows
- Added guidance for running subsequent workflows
- Retained all core infrastructure components
Testing
Test Plan
- ✅ Workflow 1 (Core Infrastructure):
- Deployed successfully: Run #18333547905
- Duration: ~8 minutes
- All outputs verified
-
Resource group:
ldfdev-rg -
⏳ Workflow 2 (Container Platform):
- Pending: Awaiting branch merge to enable workflow_dispatch trigger
- Bicep module validated
-
Parameters file created
-
⏳ Workflow 3 (Applications):
- Pending: Requires Workflow 2 completion
- GitHub Action configuration verified
Success Criteria
- Workflow 1 deploys core infrastructure successfully
- Workflow 2 discovers Workflow 1 outputs correctly
- Workflow 2 deploys ACR + Container Apps Environment
- Workflow 3 discovers Workflow 2 outputs correctly
- Workflow 3 builds and deploys API Container App
- Workflow 3 builds and deploys UI Container App
- End-to-end deployment completes in <30 minutes
- Application updates deploy in <5 minutes
References
- Three-Workflow Deployment Guide
- ADR-009: Azure Container Apps Deployment
- ADR-021: Azure Verified Modules Adoption
- ADR-023: PowerShell for Azure Deployments
- Azure Container Apps Deploy Action
- Azure Resource Manager Deployment History
- Azure Verified Modules - Container Registry
- Azure Verified Modules - Container Apps Environment
Updates
January 2025: External Container Apps Environment
Date: 2025-01-08 Context: To support public demo access, Container Apps Environment changed from internal to external mode.
Changes:
- internal: false in container-platform.bicep (previously true)
- Added exportPolicy: 'enabled' to ACR configuration (fixes deployment issue)
- Faster deployment: 5-10 minutes (vs 30-45 minutes with internal mode)
- Public endpoint capability while maintaining VNet integration
Security Maintained: - VNet integration still active - Azure DDoS Protection Basic (automatic) - Per-app ingress control (internal/external configurable) - Private endpoints for backend services (Key Vault, Storage, OpenAI) - TLS/HTTPS automatic with managed certificates
Documentation: - Added Public Demo Security Guide - Updated deployment diagrams in azure-deployment-guide.md - Updated this ADR with external environment configuration
See: git commit 5dfbd9d - "feat: enable external Container Apps environment for public demo"
Notes
- This architecture is designed for dev environments where simplicity is prioritized
- Production deployments should follow the upgrade path outlined above
- Deployment outputs approach scales well up to dozens of resources
- For 100+ resources or complex multi-region deployments, consider Azure Resource Graph
- All workflows support optional
deployment_nameparameter for parallel dev deployments - External environment configuration enables public demo access while maintaining enterprise security