Skip to content

ADR-030: Three-Workflow Deployment Architecture

Status: Accepted Date: 2025-01-08 Deciders: Development Team Related: ADR-009, ADR-021, ADR-023

Context

Our initial deployment architecture used a single monolithic GitHub Actions workflow that deployed all infrastructure components in one execution:

Single Workflow (45-60 minutes):
  → Virtual Network + Subnets + NSGs
  → Key Vault + Storage Account
  → AI Services + AI Foundry
  → Log Analytics + Application Insights
  → Container Apps Environment
  → (Future) Container Apps for API/UI

Problems Identified

  1. Long deployment times: 45-60 minutes for full infrastructure (1+ hours with VPN)
  2. Tight coupling: Infrastructure and application deployments in single workflow
  3. Slow iteration: Every code change requires full infrastructure redeployment
  4. CI/CD bottleneck: Cannot deploy apps frequently without redeploying infrastructure
  5. Authentication timeouts: Container Apps deployment failed with ClientAssertionCredential timeout after running 1+ hours
  6. Poor separation of concerns: Infrastructure-level changes mixed with application deployments

User Requirements

From conversation on 2025-01-08:

"the current bicep and deploy infrastructure will deploy all environment specific items, now i am thinking the app and api should be separate from the infrastructure deployment"

Rationale: 1. Deploy applications more frequently than infrastructure 2. Better separation of concerns 3. Faster CD pipeline for application updates

Decision

We will split the monolithic deployment into 3 separate GitHub Actions workflows with clear boundaries and dependencies:

Workflow 1: Core Infrastructure

File: .github/workflows/deploy-infrastructure.yml Frequency: Rarely (only when infrastructure configuration changes) Duration: 5-10 minutes (45+ minutes if VPN enabled)

Deploys: - Virtual Network + Subnets + NSGs - Key Vault + Storage Account - Managed Identity - AI Services + AI Foundry (with private endpoints) - Log Analytics + Application Insights - Private DNS Zones - VPN Gateway (optional, controlled by input parameter)

Outputs (stored in Azure deployment history): - containerAppsSubnetId - logAnalyticsCustomerId - logAnalyticsPrimarySharedKey - Other resource IDs as needed

Workflow 2: Container Platform

File: .github/workflows/deploy-platform.yml Frequency: Occasionally (when platform configuration changes) Duration: 5-10 minutes

Deploys: - Azure Container Registry (Premium SKU, using AVM, with export policy enabled) - Container Apps Environment (External + VNet-integrated, using AVM)

Dependencies: Requires Workflow 1 completion Discovery Method: Queries Azure deployment history for latest infrastructure-deployment-*

Outputs (stored in Azure deployment history): - acrLoginServer - acrName - containerAppsEnvName - containerAppsEnvDefaultDomain - containerAppsEnvStaticIp

Workflow 3: Applications

File: .github/workflows/deploy-applications.yml Frequency: Frequently (every code change to apps) Duration: 3-5 minutes

Deploys: - API Container App (internal ingress, port 8000) - UI Container App (external ingress, port 8080)

Dependencies: Requires Workflows 1 & 2 completion Discovery Method: Queries Azure deployment history for latest container-platform-*

Uses: Official azure/container-apps-deploy-action@v1 GitHub Action - Builds Docker images from Dockerfiles - Pushes to ACR - Creates/updates Container Apps - Manages revisions and traffic

Auto-trigger: Pushes to main branch that modify apps/api/** or apps/ui/**

Resource Discovery Strategy

Alternatives Considered

  1. ❌ Naming Convention:
  2. Use predictable resource names ({prefix}-vnet, {prefix}-acr)
  3. Rejected: Fragile, breaks with custom deployments, no validation

  4. ✅ Deployment Outputs (SELECTED):

  5. Query Azure deployment history for outputs
  6. ~10 lines of PowerShell, 200-500ms per query
  7. Selected: Simple, reliable, works across days/weeks

  8. ❌ Resource Tags:

  9. Tag resources with deployment metadata
  10. Query resources by tags
  11. Rejected: More complex than deployment outputs, no clear benefit

  12. ❌ Multi-RG + Azure Resource Graph:

  13. Separate resource groups per layer
  14. Use Azure Resource Graph for cross-RG queries
  15. Rejected: Over-engineering for dev environment
  16. TODO: Document as production upgrade path

Implementation (Deployment Outputs)

Example: Workflow 2 discovering Workflow 1 outputs:

# Find latest core infrastructure deployment
$deployment = Get-AzResourceGroupDeployment `
  -ResourceGroupName "ldfdev-rg" `
  | Where-Object { $_.DeploymentName -like "infrastructure-deployment-*" } `
  | Sort-Object Timestamp -Descending `
  | Select-Object -First 1

if (-not $deployment) {
  Write-Error "❌ No core infrastructure deployment found"
  exit 1
}

# Extract required outputs
$containerAppsSubnetId = $deployment.Outputs.containerAppsSubnetId.Value
$logAnalyticsCustomerId = $deployment.Outputs.logAnalyticsCustomerId.Value
$logAnalyticsSharedKey = $deployment.Outputs.logAnalyticsPrimarySharedKey.Value

# Use in deployment
New-AzResourceGroupDeployment `
  -Name "container-platform-$(Get-Date -Format 'yyyyMMdd-HHmmss')" `
  -TemplateFile "infrastructure/bicep/modules/container-platform.bicep" `
  -containerAppsSubnetId $containerAppsSubnetId `
  -logAnalyticsCustomerId $logAnalyticsCustomerId `
  -logAnalyticsPrimarySharedKey $sharedKeySecure

Why This Works: - ✅ Simple: ~10 lines of PowerShell per workflow - ✅ Fast: 200-500ms per query - ✅ Reliable: Uses Azure's built-in deployment history - ✅ Temporal decoupling: Works across days/weeks - ✅ Self-documenting: Deployment names clearly indicate purpose - ✅ Validation: Can check for missing outputs and fail early

Azure Verified Modules Usage

Workflow 2 uses official Azure Verified Modules (AVM) for container platform resources:

// Azure Container Registry (AVM)
module containerRegistry 'br/public:avm/res/container-registry/registry:0.5.0' = {
  name: 'acr-deployment'
  params: {
    name: acrName
    location: location
    acrSku: 'Premium'
    publicNetworkAccess: 'Enabled'  // TODO: Private endpoint in production
    acrAdminUserEnabled: false      // Use managed identity
    tags: tags
  }
}

// Container Apps Environment (AVM)
module containerAppsEnv 'br/public:avm/res/app/managed-environment:0.8.0' = {
  name: 'container-apps-env-deployment'
  params: {
    name: containerAppsEnvName
    location: location
    infrastructureSubnetResourceId: containerAppsSubnetId
    internal: true
    logAnalyticsConfiguration: {
      customerId: logAnalyticsCustomerId
      sharedKey: logAnalyticsPrimarySharedKey
    }
    tags: tags
  }
}

See: ADR-021: Azure Verified Modules Adoption

Deployment Flow

First-Time Deployment

# 1. Deploy core infrastructure (5-10 minutes)
gh workflow run deploy-infrastructure.yml \
  --field environment=dev \
  --field deploy_vpn_gateway=false

# Wait for completion

# 2. Deploy container platform (5-10 minutes, faster with external environment)
gh workflow run deploy-platform.yml \
  --field environment=dev

# Wait for completion

# 3. Deploy applications (3-5 minutes)
gh workflow run deploy-applications.yml \
  --field environment=dev

Total: ~15-30 minutes for complete environment

Application Updates (Most Common)

# Option 1: Automatic (on git push to main)
git add apps/api/ apps/ui/
git commit -m "feat: update API and UI"
git push origin main
# Workflow 3 triggers automatically (3-5 minutes)

# Option 2: Manual
gh workflow run deploy-applications.yml --field environment=dev

Duration: 3-5 minutes (vs 45-60 minutes with monolithic workflow)

Consequences

Positive

  1. ✅ Faster CI/CD: Application deployments reduced from 45-60 min to 3-5 min (90% improvement)
  2. ✅ Separation of concerns: Clear boundaries between infrastructure, platform, and applications
  3. ✅ Flexible deployment cadence:
  4. Infrastructure: Rarely (weeks/months)
  5. Platform: Occasionally (weeks)
  6. Applications: Frequently (multiple times per day)
  7. ✅ Temporal decoupling: Workflows can run days/weeks apart without issues
  8. ✅ Reduced risk: Smaller, focused deployments easier to test and rollback
  9. ✅ Better developer experience: Faster iteration cycles for application development
  10. ✅ Cost optimization: Only redeploy what changed
  11. ✅ Parallel development: Multiple teams can work on different layers independently

Negative

  1. ⚠️ More workflows to manage: 3 workflows vs 1 (mitigated by clear documentation)
  2. ⚠️ Dependency tracking: Must ensure Workflow 1 → 2 → 3 order (mitigated by dependency checks)
  3. ⚠️ Single resource group: All resources in one RG for dev (acceptable for dev, upgrade path documented)
  4. ⚠️ Learning curve: Developers must understand 3-workflow architecture (mitigated by comprehensive docs)

Mitigation Strategies

For Dependency Tracking: - Each workflow validates dependencies before deployment - Clear error messages with actionable instructions - Deployment history queries fail early if dependencies missing

Example Error Message:

❌ DEPENDENCY NOT FOUND

No core infrastructure deployment found in resource group: ldfdev-rg

ACTION REQUIRED:
  1. Deploy core infrastructure: Run 'Deploy Infrastructure' workflow
  2. Verify the workflow completed successfully
  3. Retry this workflow

Deployment blocked until core infrastructure is available.

For Single Resource Group Limitation: - Document production upgrade path to multi-RG architecture - Create TODO in documentation for future implementation - Acceptable tradeoff for dev environment simplicity

Production Upgrade Path (TODO)

For production environments, consider upgrading to:

  1. Multi-Resource Groups:
  2. {env}-core-rg: Networking, Key Vault, AI Services
  3. {env}-platform-rg: ACR, Container Apps Environment
  4. {env}-apps-rg: Container Apps (API/UI)

  5. Azure Resource Graph:

  6. Cross-RG queries using KQL
  7. Faster than deployment output queries at scale
  8. Example:

    Resources
    | where type == "microsoft.network/virtualnetworks"
    | where tags.project == "loan-defenders"
    | where tags.environment == "prod"
    | where tags.layer == "core"
    

  9. Azure Policy:

  10. Enforce tagging standards
  11. Validate resource naming conventions
  12. Audit compliance

  13. Resource Locks:

  14. Prevent accidental deletion of infrastructure resources
  15. Read-only locks on Workflow 1 resources
  16. Delete locks on critical resources (VNet, Key Vault)

Documentation: To be created in docs/deployment/production-upgrade-guide.md

Implementation

Files Created

  1. Workflows:
  2. .github/workflows/deploy-infrastructure.yml (MODIFIED - removed Container Apps)
  3. .github/workflows/deploy-container-platform.yml (NEW)
  4. .github/workflows/deploy-applications.yml (NEW)

  5. Bicep Modules:

  6. infrastructure/bicep/modules/container-platform.bicep (NEW - uses AVM)

  7. Parameters:

  8. infrastructure/bicep/environments/dev-container-platform.parameters.json (NEW)

  9. Documentation:

  10. docs/deployment/three-workflow-deployment.md (NEW - comprehensive guide)
  11. This ADR

Changes to Existing Workflows

deploy-infrastructure.yml: - Removed Container Apps Environment deployment (previously Step 3.4) - Updated deployment summaries to reference next workflows - Added guidance for running subsequent workflows - Retained all core infrastructure components

Testing

Test Plan

  1. ✅ Workflow 1 (Core Infrastructure):
  2. Deployed successfully: Run #18333547905
  3. Duration: ~8 minutes
  4. All outputs verified
  5. Resource group: ldfdev-rg

  6. ⏳ Workflow 2 (Container Platform):

  7. Pending: Awaiting branch merge to enable workflow_dispatch trigger
  8. Bicep module validated
  9. Parameters file created

  10. ⏳ Workflow 3 (Applications):

  11. Pending: Requires Workflow 2 completion
  12. GitHub Action configuration verified

Success Criteria

  • Workflow 1 deploys core infrastructure successfully
  • Workflow 2 discovers Workflow 1 outputs correctly
  • Workflow 2 deploys ACR + Container Apps Environment
  • Workflow 3 discovers Workflow 2 outputs correctly
  • Workflow 3 builds and deploys API Container App
  • Workflow 3 builds and deploys UI Container App
  • End-to-end deployment completes in <30 minutes
  • Application updates deploy in <5 minutes

References

Updates

January 2025: External Container Apps Environment

Date: 2025-01-08 Context: To support public demo access, Container Apps Environment changed from internal to external mode.

Changes: - internal: false in container-platform.bicep (previously true) - Added exportPolicy: 'enabled' to ACR configuration (fixes deployment issue) - Faster deployment: 5-10 minutes (vs 30-45 minutes with internal mode) - Public endpoint capability while maintaining VNet integration

Security Maintained: - VNet integration still active - Azure DDoS Protection Basic (automatic) - Per-app ingress control (internal/external configurable) - Private endpoints for backend services (Key Vault, Storage, OpenAI) - TLS/HTTPS automatic with managed certificates

Documentation: - Added Public Demo Security Guide - Updated deployment diagrams in azure-deployment-guide.md - Updated this ADR with external environment configuration

See: git commit 5dfbd9d - "feat: enable external Container Apps environment for public demo"

Notes

  • This architecture is designed for dev environments where simplicity is prioritized
  • Production deployments should follow the upgrade path outlined above
  • Deployment outputs approach scales well up to dozens of resources
  • For 100+ resources or complex multi-region deployments, consider Azure Resource Graph
  • All workflows support optional deployment_name parameter for parallel dev deployments
  • External environment configuration enables public demo access while maintaining enterprise security