Skip to content

GitHub Actions CI/CD Guide

Overview

This project uses GitHub Actions for continuous integration and deployment with OIDC (OpenID Connect) authentication to Azure. This provides passwordless, secure deployments without storing credentials in GitHub secrets.

Workflows

1. Deploy Azure Infrastructure

File: .github/workflows/deploy-azure-infrastructure.yml

Deploys Bicep templates to Azure using PowerShell and OIDC authentication.

Triggers: - Manual dispatch (workflow_dispatch) - Can select environment (dev, staging, prod) - Can select deployment stage (foundation, security, ai, apps, all)

Inputs: - environment: Target environment (dev/staging/prod) - stage: Deployment stage (foundation/security/ai/apps/all)

⚠️ IMPORTANT - Use stage='all' for deployments: - Recommended: Always use stage='all' to deploy all infrastructure - Issue: Staged deployments (foundation, security, ai, apps separately) have a bug where dependencies aren't detected (#113) - Workaround: Use stage='all' and let Bicep's incremental mode handle updates (only changes are deployed) - Performance: Full deployment only takes ~10 minutes, incremental updates are faster

Example Usage: 1. Go to Actions tab → Deploy Azure Infrastructure 2. Click "Run workflow" 3. Select environment (dev/staging/prod) 4. Select stage: all ⬅️ Always use 'all' 5. Click "Run workflow"

What it does: 1. Checks out code 2. Installs Azure PowerShell modules 3. Authenticates to Azure via OIDC 4. Runs PowerShell deployment (New-AzResourceGroupDeployment) 5. Displays deployment outputs

2. Deploy AI Models

File: .github/workflows/deploy-ai-models.yml

Deploys AI model configurations to existing Azure AI Services account.

Triggers: - Manual dispatch (workflow_dispatch) - Requires infrastructure to be deployed first

Inputs: - environment: Target environment (dev/staging/prod)

Example Usage: 1. Go to Actions tab → Deploy AI Models 2. Click "Run workflow" 3. Select environment 4. Click "Run workflow"

What it does: 1. Validates prerequisites (resource group, AI Services exist) 2. Deploys model configurations from environments/{env}-models.parameters.json 3. Idempotent (safe to run multiple times) 4. Displays deployed models

3. Teardown Dev Environment

File: .github/workflows/teardown-dev-environment.yml

⚠️ DANGEROUS: Deletes all resources in dev environment to save costs.

Triggers: - Manual dispatch with confirmation required

Safety Features: - Must type "DELETE" to confirm - Two-stage validation process - Only affects dev environment - Shows what will be deleted before proceeding

Inputs: - confirm: Must type "DELETE" - delete_resource_group: Delete entire resource group (true/false)

Teardown Order: 1. Delete model deployments 2. Delete Container Apps Environment 3. Delete AI Services (Hub, Project, AI Services, AI Search) 4. Delete resource group (all remaining resources)

Cost Savings: ~$100-500/month while dev environment is down

4. Test Workflow

File: .github/workflows/test.yml

Runs automated tests on pull requests and pushes to main.

Triggers: - Push to main - Pull requests to main

What it runs: 1. Linting (ruff check) 2. Formatting check (ruff format --check) 3. Unit tests (pytest) 4. Coverage reporting (requires ≥85%)

OIDC Authentication

How It Works

GitHub Actions
Request OIDC token from GitHub
Present token to Azure
Azure validates token
Azure grants temporary access
Deploy resources

Benefits: - ✅ No passwords or keys stored in GitHub - ✅ Temporary credentials (expire after workflow) - ✅ Granular permissions via RBAC - ✅ Audit trail in Azure AD - ✅ Automatic rotation (no manual renewal)

Setup

See infrastructure/scripts/setup-github-service-principal.sh for OIDC setup.

Required Secrets (in GitHub repository settings): - AZURE_CLIENT_ID: Service principal application ID - AZURE_TENANT_ID: Azure AD tenant ID - AZURE_SUBSCRIPTION_ID: Azure subscription ID

Required Azure Setup: 1. Create Azure AD App Registration 2. Create federated credential for GitHub 3. Assign appropriate RBAC roles (CRITICAL): - ✅ Contributor (subscription-wide) - For creating/modifying resources - ✅ User Access Administrator (resource group-scoped) - For assigning RBAC roles - ⚠️ Both roles required for AI infrastructure deployment 4. Add secrets to GitHub repository

Why Two Roles? - Contributor: Creates Azure resources (VNet, AI Services, Key Vault, etc.) - User Access Administrator: Assigns RBAC roles between resources (Hub → AI Services, Project → Storage, etc.) - Without UAA, AI infrastructure deployment fails with authorization errors

See: ADR-027: User Access Administrator for GitHub Actions

Branch Protection

Main Branch Protection Rules

Enforced on main branch: 1. ✅ Require pull request before merging 2. ✅ Require status checks to pass: - Linting (ruff check) - Formatting (ruff format) - Tests (pytest) - Coverage ≥85% 3. ✅ Require code review approval (1+ reviewer) 4. ✅ Dismiss stale reviews when new commits pushed 5. ✅ Require conversation resolution before merge 6. ❌ Force pushes disabled 7. ❌ Deletions disabled

See: ADR-015: Branch Protection Strategy

Security Best Practices

Workflow Security

  1. Minimal Permissions:

    permissions:
      id-token: write   # Required for OIDC
      contents: read    # Read-only access to code
    

  2. Environment Protection:

  3. Production deployments require approval
  4. Environment-specific secrets
  5. Deployment restrictions

  6. Secret Management:

  7. Never log secrets
  8. Use ${{ secrets.NAME }} syntax
  9. Rotate regularly

  10. Dependency Security:

  11. Pin action versions (@v4 not @latest)
  12. Use verified actions from GitHub/Microsoft
  13. Renovate/Dependabot for updates

Azure Security

  1. Least Privilege RBAC:
  2. Service principal has minimum required permissions
  3. Environment-specific role assignments
  4. Separate service principals per environment

  5. Audit Logging:

  6. All deployments logged in Azure Activity Log
  7. OIDC authentication tracked in Azure AD
  8. GitHub Actions logs retained

  9. Network Security:

  10. Deployments from GitHub IPs only
  11. Private endpoints for Azure resources
  12. No public internet access to services

See: ADR-016: GitHub Actions Security

Deployment Workflow

Standard Deployment Process

  1. Create Feature Branch:

    git checkout -b feat/my-feature
    

  2. Make Changes:

  3. Modify Bicep templates
  4. Update parameter files
  5. Update documentation

  6. Local Testing:

    # Validate Bicep
    az bicep build --file infrastructure/bicep/all-in-one.bicep
    
    # Run tests
    uv run pytest tests/
    
    # Lint and format
    uv run ruff check .
    uv run ruff format .
    

  7. Create Pull Request:

    git push origin feat/my-feature
    gh pr create --title "feat: my feature" --body "Description"
    

  8. Automated Checks:

  9. GitHub Actions runs tests
  10. Code review required
  11. Status checks must pass

  12. Merge to Main:

  13. After approval and passing checks
  14. Squash and merge (keep history clean)

  15. Deploy to Dev:

  16. Manual trigger of "Deploy Azure Infrastructure" workflow
  17. Select dev environment
  18. Monitor deployment progress

  19. Deploy to Staging (optional):

  20. Same process, select staging environment

  21. Deploy to Production:

  22. Requires approval from designated reviewers
  23. Select prod environment
  24. Extra validation and confirmation

Rollback Procedure

If deployment fails or introduces issues:

  1. Immediate Rollback:

    # Revert to previous commit
    git revert <commit-sha>
    git push origin main
    
    # Redeploy
    # Trigger workflow with previous good configuration
    

  2. Infrastructure Rollback:

    # Redeploy previous working version
    ./infrastructure/scripts/deploy.sh <env> <rg> --stage all
    

  3. Partial Rollback:

    # Redeploy specific stage only
    ./infrastructure/scripts/deploy.sh <env> <rg> --stage <stage>
    

Monitoring Deployments

GitHub Actions Logs

  1. Go to repository → Actions tab
  2. Select workflow run
  3. View logs for each step
  4. Download logs if needed

Azure Deployment Logs

# View deployment details
Get-AzResourceGroupDeployment `
  -ResourceGroupName "loan-defenders-dev-rg" `
  -Name "<deployment-name>"

# View deployment errors
$deployment = Get-AzResourceGroupDeployment `
  -ResourceGroupName "loan-defenders-dev-rg" `
  -Name "<deployment-name>"

$deployment.Properties.Error | ConvertTo-Json -Depth 10

Application Insights

  • Telemetry and metrics: Azure Portal → Application Insights
  • Deployment annotations visible in timeline
  • Correlate deployments with errors/performance

Troubleshooting

Common Issues

1. OIDC Authentication Failed

Error: "Unable to get OIDC id token from the GitHub endpoint"

Solutions: - Verify AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID secrets - Check federated credential configuration in Azure AD - Ensure service principal has required permissions

2. Deployment Failed - Insufficient Permissions

Error: "The client does not have authorization to perform action"

Solutions: - Verify RBAC role assignments for service principal - Check resource group permissions - Verify subscription-level permissions

2a. RBAC Role Assignment Failed (AI Infrastructure)

Error: Authorization failed for template resource of type 'Microsoft.Authorization/roleAssignments'. The client does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write'

Symptom: AI Services, AI Foundry Hub, and AI Foundry Project not created

Root Cause: Service principal lacks User Access Administrator role

Solution:

# Grant User Access Administrator role at resource group scope
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
APP_ID="<your-service-principal-client-id>"
RG_NAME="loan-defenders-dev-rg"

az role assignment create \
  --role "User Access Administrator" \
  --assignee $APP_ID \
  --scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG_NAME \
  --description "Allow GitHub Actions to assign RBAC roles during deployment"

Verification:

# Check both roles are assigned
az role assignment list \
  --assignee $APP_ID \
  --all \
  --query "[].{Role:roleDefinitionName, Scope:scope}" \
  -o table

# Expected output:
# Role                       Scope
# -------------------------  --------------------------------------------------
# Contributor                /subscriptions/<sub-id>
# User Access Administrator  /subscriptions/<sub-id>/resourceGroups/<rg-name>

Why this is needed: - AI Foundry deployment creates 8+ role assignments between resources - Hub needs access to AI Services, Storage, Key Vault - Project needs access to AI Services, Storage, Key Vault - Without UAA role, all AI infrastructure deployment fails and rolls back

See: ADR-027: User Access Administrator for GitHub Actions

3. Template Validation Failed

Error: "Template validation failed: ..."

Solutions: - Validate Bicep locally: az bicep build - Check parameter file format - Verify required parameters are provided

4. Resource Already Exists

Error: "A resource with the ID '...' already exists"

Solutions: - Bicep deployments are idempotent (usually safe to re-run) - Check if previous deployment is still in progress - Use --mode Complete carefully (can delete resources)

Best Practices

Workflow Development

  1. Test locally first before pushing to GitHub
  2. Use minimal permissions in workflow files
  3. Pin action versions for stability
  4. Add descriptive names to workflow steps
  5. Use environment variables for reusability
  6. Add deployment summaries with GITHUB_STEP_SUMMARY
  7. Handle errors gracefully with proper exit codes

Infrastructure Changes

  1. Always use pull requests for infrastructure changes
  2. Deploy to dev first before staging/prod
  3. Document ADRs for architectural changes
  4. Update parameter files for all environments
  5. Test rollback procedures in dev environment
  6. Monitor deployments in real-time
  7. Keep deployment logs for audit purposes

Security

  1. Never commit secrets to repository
  2. Use environment-specific secrets in GitHub
  3. Rotate service principal credentials regularly
  4. Review workflow permissions quarterly
  5. Audit deployment logs for suspicious activity
  6. Use approved actions only (GitHub/Microsoft)
  7. Enable Dependabot for action updates

References