Skip to content

ADR-023: Use PowerShell for Azure Infrastructure Deployments

Status: Accepted Date: 2025-10-05 Decision Makers: Development Team Related Issues: #100 Related PRs: #102

Context

We experienced deployment failures when attempting to deploy Azure infrastructure using Azure CLI (az deployment group create). The error manifested as:

ERROR: The content for this response was already consumed

This is a known Azure CLI bug (Azure/azure-cli#32149) in the CLI's internal HTTP response handling that has been open since September 2024 with no fix timeline from Microsoft.

Investigation Process

Following proper troubleshooting methodology, we:

  1. Used az deployment group what-if - Discovered invalid parameters (vnetName, deployPrivateEndpoints) that weren't defined in the template
  2. Fixed parameter file - Removed invalid parameters from dev.parameters.json and prod.parameters.json
  3. Attempted deployment with Azure CLI - Still encountered "content already consumed" bug
  4. Tested with PowerShell - Deployment worked and provided detailed error messages

Errors Discovered via PowerShell (That Azure CLI Masked)

PowerShell's verbose output revealed two critical issues that Azure CLI's bug prevented us from seeing:

Issue 1: NSG Security Rules with Invalid Address Prefixes

Error: Security rule has invalid Address prefix.
Value provided: 10.0.1.0/23

Root Cause: NSG rules referenced subnet CIDR blocks (containerAppsSubnetPrefix, apimSubnetPrefix) before subnets existed. Azure validates NSG rules but can't resolve CIDR blocks for non-existent subnets.

Fix: Changed all NSG rules to use 'VirtualNetwork' service tag instead of specific CIDR blocks.

Issue 2: Invalid CIDR Notation

Error: The address prefix 10.0.1.0/23 has an invalid CIDR notation.
For the given prefix length, the address prefix should be 10.0.0.0/23

Root Cause: A /23 network must start on an address boundary divisible by 2. 10.0.1.0/23 is invalid.

Fix: Updated subnet allocation: - Container Apps: 10.0.0.0/23 (512 IPs) - APIM: 10.0.2.0/24 (256 IPs) - Private Endpoints: 10.0.3.0/24 (256 IPs)

Decision

We will use Azure PowerShell (New-AzResourceGroupDeployment) for all Azure infrastructure deployments instead of Azure CLI.

Rationale

  1. No Azure CLI Bug - PowerShell uses different .NET HTTP libraries, completely bypassing the Azure CLI bug
  2. Superior Error Messages - PowerShell provides detailed, structured error output with specific resource failures and actionable messages
  3. Industry Standard - PowerShell is the native Azure automation tool and widely used in enterprise environments
  4. Better Debugging - Supports -Verbose, -Debug, and -DeploymentDebugLogLevel for comprehensive troubleshooting
  5. Mature Tooling - Azure PowerShell module is well-maintained and stable

Comparison: Azure CLI vs PowerShell

Aspect Azure CLI PowerShell
Bug Status ❌ "Content consumed" bug ✅ No issues
Error Messages ❌ Generic, masked by bug ✅ Detailed, specific
Debugging ⚠️ --debug flag (broken) -Verbose, -Debug
Validation ⚠️ --what-if worked ✅ Integrated validation
Test Result ❌ Failed Succeeded

Consequences

Positive

  • Reliable Deployments - No workarounds needed for Azure CLI bugs
  • Better Error Messages - Faster troubleshooting with detailed output
  • Production Ready - PowerShell is enterprise-grade and battle-tested
  • Full Feature Support - Access to all Azure Resource Manager capabilities
  • CI/CD Compatible - Works seamlessly in GitHub Actions with azure/login@v2 and enable-AzPSSession: true

Negative

  • ⚠️ Requires PowerShell - Additional dependency (PowerShell 7+)
  • ⚠️ Script Changes - Need to update deploy.sh and GitHub Actions workflows
  • ⚠️ Learning Curve - Team needs familiarity with PowerShell cmdlets (minimal)
  • ⚠️ Cross-Platform - PowerShell Core required for Linux/macOS (already available)

Migration Required

  1. Local Development - Install PowerShell 7+ and Azure PowerShell module
  2. Deploy Script - Update infrastructure/bicep/deploy.sh to use PowerShell
  3. GitHub Actions - Update .github/workflows/deploy-infrastructure.yml to use PowerShell
  4. Documentation - Update deployment guides with PowerShell instructions

Implementation

Prerequisites

# Install PowerShell 7+ (if not installed)
# macOS
brew install powershell

# Windows
winget install Microsoft.PowerShell

# Linux (Ubuntu/Debian)
wget https://github.com/PowerShell/PowerShell/releases/download/v7.4.6/powershell-7.4.6-linux-x64.tar.gz
sudo tar -xzf powershell-7.4.6-linux-x64.tar.gz -C /opt/microsoft/powershell/7
sudo ln -s /opt/microsoft/powershell/7/pwsh /usr/bin/pwsh

# Install Azure PowerShell module
pwsh -Command "Install-Module -Name Az -Repository PSGallery -Force -AllowClobber"

PowerShell Deployment Command

# Connect to Azure (one-time)
Connect-AzAccount

# Deploy infrastructure
New-AzResourceGroupDeployment `
  -Name "deployment-name" `
  -ResourceGroupName "loan-defenders-dev-rg" `
  -TemplateFile "main-avm.bicep" `
  -TemplateParameterFile "environments/dev.parameters.json" `
  -deploymentStage "foundation" `
  -Verbose

GitHub Actions Integration

- name: Log in to Azure using OIDC (PowerShell)
  uses: azure/login@v2
  with:
    client-id: ${{ secrets.AZURE_CLIENT_ID }}
    tenant-id: ${{ secrets.AZURE_TENANT_ID }}
    subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}
    enable-AzPSSession: true  # Enable PowerShell session

- name: Deploy with PowerShell
  shell: pwsh
  run: |
    New-AzResourceGroupDeployment `
      -Name "${{ env.DEPLOYMENT_NAME }}" `
      -ResourceGroupName "${{ env.RESOURCE_GROUP }}" `
      -TemplateFile "infrastructure/bicep/main-avm.bicep" `
      -TemplateParameterFile "infrastructure/bicep/environments/${{ inputs.environment }}.parameters.json" `
      -deploymentStage "${{ inputs.stage }}" `
      -Verbose

Lessons Learned

What Went Wrong

  1. Jumped to Workarounds Too Quickly - Initially attempted REST API workaround (PR #101) without proper diagnosis
  2. Didn't Use Validation Tools First - Should have run --what-if immediately to catch parameter issues
  3. Didn't Test Alternatives - Should have tried PowerShell before building complex REST API solution

Proper Troubleshooting Sequence (For Future Reference)

1. Validate template first
   → az deployment group what-if (or Test-AzResourceGroupDeployment)

2. Fix any validation errors
   → Update templates/parameters

3. Try deployment with debug/verbose
   → az deployment group create --debug (if working)
   → New-AzResourceGroupDeployment -Verbose (PowerShell)

4. If Azure CLI fails, try PowerShell
   → Different HTTP stack, often avoids CLI-specific bugs

5. Only if both fail, consider REST API
   → Most complex solution, use as last resort

Key Takeaway

Always test with PowerShell when Azure CLI has issues. PowerShell is often more reliable and provides better diagnostics than Azure CLI for Azure Resource Manager operations.

Alternatives Considered

1. Azure CLI with Workarounds

Approach: Continue using Azure CLI with various workarounds (compile to JSON, retry logic, etc.)

Pros: - Familiar to team - Already in use

Cons: - ❌ Bug persists across versions - ❌ No fix timeline from Microsoft - ❌ Poor error messages - ❌ Unreliable for production

Decision: Rejected - Not sustainable long-term

2. Azure REST API Direct

Approach: Use az rest to call Azure Resource Manager APIs directly (implemented in PR #101)

Pros: - Bypasses Azure CLI bug - Full control over HTTP requests

Cons: - ❌ Complex implementation (jq required, manual JSON construction) - ❌ Hit "Argument list too long" error with large templates - ❌ More code to maintain - ❌ Less readable than native commands

Decision: Rejected - Overly complex for the problem

3. Azure PowerShell (Selected)

Approach: Use New-AzResourceGroupDeployment cmdlet

Pros: - ✅ No Azure CLI bug - ✅ Excellent error messages - ✅ Industry standard - ✅ Simple, clean implementation - ✅ CI/CD compatible

Cons: - ⚠️ Requires PowerShell (minor - widely available)

Decision: Accepted - Best balance of reliability and simplicity

References

Success Metrics

  • ✅ Foundation stage deployed successfully via PowerShell
  • ✅ Clear error messages enabled rapid issue resolution
  • ✅ No Azure CLI bug encountered
  • ✅ Template issues (NSG rules, CIDR notation) identified and fixed

Next Steps

  1. ✅ Create this ADR documenting the decision
  2. ⏳ Update infrastructure/bicep/deploy.sh to use PowerShell
  3. ⏳ Update .github/workflows/deploy-infrastructure.yml to use PowerShell
  4. ⏳ Update deployment documentation
  5. ⏳ Close PR #101 (REST API approach no longer needed)
  6. ⏳ Deploy remaining stages (security, ai, apps) via PowerShell