GitHub Actions CI/CD Guide
Overview
This project uses GitHub Actions for continuous integration and deployment with OIDC (OpenID Connect) authentication to Azure. This provides passwordless, secure deployments without storing credentials in GitHub secrets.
Workflows
1. Deploy Azure Infrastructure
File: .github/workflows/deploy-azure-infrastructure.yml
Deploys Bicep templates to Azure using PowerShell and OIDC authentication.
Triggers:
- Manual dispatch (workflow_dispatch)
- Can select environment (dev, staging, prod)
- Can select deployment stage (foundation, security, ai, apps, all)
Inputs:
- environment: Target environment (dev/staging/prod)
- stage: Deployment stage (foundation/security/ai/apps/all)
⚠️ IMPORTANT - Use stage='all' for deployments:
- Recommended: Always use stage='all' to deploy all infrastructure
- Issue: Staged deployments (foundation, security, ai, apps separately) have a bug where dependencies aren't detected (#113)
- Workaround: Use stage='all' and let Bicep's incremental mode handle updates (only changes are deployed)
- Performance: Full deployment only takes ~10 minutes, incremental updates are faster
Example Usage:
1. Go to Actions tab → Deploy Azure Infrastructure
2. Click "Run workflow"
3. Select environment (dev/staging/prod)
4. Select stage: all ⬅️ Always use 'all'
5. Click "Run workflow"
What it does:
1. Checks out code
2. Installs Azure PowerShell modules
3. Authenticates to Azure via OIDC
4. Runs PowerShell deployment (New-AzResourceGroupDeployment)
5. Displays deployment outputs
2. Deploy AI Models
File: .github/workflows/deploy-ai-models.yml
Deploys AI model configurations to existing Azure AI Services account.
Triggers:
- Manual dispatch (workflow_dispatch)
- Requires infrastructure to be deployed first
Inputs:
- environment: Target environment (dev/staging/prod)
Example Usage: 1. Go to Actions tab → Deploy AI Models 2. Click "Run workflow" 3. Select environment 4. Click "Run workflow"
What it does:
1. Validates prerequisites (resource group, AI Services exist)
2. Deploys model configurations from environments/{env}-models.parameters.json
3. Idempotent (safe to run multiple times)
4. Displays deployed models
3. Teardown Dev Environment
File: .github/workflows/teardown-dev-environment.yml
⚠️ DANGEROUS: Deletes all resources in dev environment to save costs.
Triggers: - Manual dispatch with confirmation required
Safety Features: - Must type "DELETE" to confirm - Two-stage validation process - Only affects dev environment - Shows what will be deleted before proceeding
Inputs:
- confirm: Must type "DELETE"
- delete_resource_group: Delete entire resource group (true/false)
Teardown Order: 1. Delete model deployments 2. Delete Container Apps Environment 3. Delete AI Services (Hub, Project, AI Services, AI Search) 4. Delete resource group (all remaining resources)
Cost Savings: ~$100-500/month while dev environment is down
4. Test Workflow
File: .github/workflows/test.yml
Runs automated tests on pull requests and pushes to main.
Triggers:
- Push to main
- Pull requests to main
What it runs:
1. Linting (ruff check)
2. Formatting check (ruff format --check)
3. Unit tests (pytest)
4. Coverage reporting (requires ≥85%)
OIDC Authentication
How It Works
GitHub Actions
↓
Request OIDC token from GitHub
↓
Present token to Azure
↓
Azure validates token
↓
Azure grants temporary access
↓
Deploy resources
Benefits: - ✅ No passwords or keys stored in GitHub - ✅ Temporary credentials (expire after workflow) - ✅ Granular permissions via RBAC - ✅ Audit trail in Azure AD - ✅ Automatic rotation (no manual renewal)
Setup
See infrastructure/scripts/setup-github-service-principal.sh for OIDC setup.
Required Secrets (in GitHub repository settings):
- AZURE_CLIENT_ID: Service principal application ID
- AZURE_TENANT_ID: Azure AD tenant ID
- AZURE_SUBSCRIPTION_ID: Azure subscription ID
Required Azure Setup: 1. Create Azure AD App Registration 2. Create federated credential for GitHub 3. Assign appropriate RBAC roles (CRITICAL): - ✅ Contributor (subscription-wide) - For creating/modifying resources - ✅ User Access Administrator (resource group-scoped) - For assigning RBAC roles - ⚠️ Both roles required for AI infrastructure deployment 4. Add secrets to GitHub repository
Why Two Roles? - Contributor: Creates Azure resources (VNet, AI Services, Key Vault, etc.) - User Access Administrator: Assigns RBAC roles between resources (Hub → AI Services, Project → Storage, etc.) - Without UAA, AI infrastructure deployment fails with authorization errors
See: ADR-027: User Access Administrator for GitHub Actions
Branch Protection
Main Branch Protection Rules
Enforced on main branch:
1. ✅ Require pull request before merging
2. ✅ Require status checks to pass:
- Linting (ruff check)
- Formatting (ruff format)
- Tests (pytest)
- Coverage ≥85%
3. ✅ Require code review approval (1+ reviewer)
4. ✅ Dismiss stale reviews when new commits pushed
5. ✅ Require conversation resolution before merge
6. ❌ Force pushes disabled
7. ❌ Deletions disabled
See: ADR-015: Branch Protection Strategy
Security Best Practices
Workflow Security
-
Minimal Permissions:
-
Environment Protection:
- Production deployments require approval
- Environment-specific secrets
-
Deployment restrictions
-
Secret Management:
- Never log secrets
- Use
${{ secrets.NAME }}syntax -
Rotate regularly
-
Dependency Security:
- Pin action versions (
@v4not@latest) - Use verified actions from GitHub/Microsoft
- Renovate/Dependabot for updates
Azure Security
- Least Privilege RBAC:
- Service principal has minimum required permissions
- Environment-specific role assignments
-
Separate service principals per environment
-
Audit Logging:
- All deployments logged in Azure Activity Log
- OIDC authentication tracked in Azure AD
-
GitHub Actions logs retained
-
Network Security:
- Deployments from GitHub IPs only
- Private endpoints for Azure resources
- No public internet access to services
See: ADR-016: GitHub Actions Security
Deployment Workflow
Standard Deployment Process
-
Create Feature Branch:
-
Make Changes:
- Modify Bicep templates
- Update parameter files
-
Update documentation
-
Local Testing:
-
Create Pull Request:
-
Automated Checks:
- GitHub Actions runs tests
- Code review required
-
Status checks must pass
-
Merge to Main:
- After approval and passing checks
-
Squash and merge (keep history clean)
-
Deploy to Dev:
- Manual trigger of "Deploy Azure Infrastructure" workflow
- Select
devenvironment -
Monitor deployment progress
-
Deploy to Staging (optional):
-
Same process, select
stagingenvironment -
Deploy to Production:
- Requires approval from designated reviewers
- Select
prodenvironment - Extra validation and confirmation
Rollback Procedure
If deployment fails or introduces issues:
-
Immediate Rollback:
-
Infrastructure Rollback:
-
Partial Rollback:
Monitoring Deployments
GitHub Actions Logs
- Go to repository → Actions tab
- Select workflow run
- View logs for each step
- Download logs if needed
Azure Deployment Logs
# View deployment details
Get-AzResourceGroupDeployment `
-ResourceGroupName "loan-defenders-dev-rg" `
-Name "<deployment-name>"
# View deployment errors
$deployment = Get-AzResourceGroupDeployment `
-ResourceGroupName "loan-defenders-dev-rg" `
-Name "<deployment-name>"
$deployment.Properties.Error | ConvertTo-Json -Depth 10
Application Insights
- Telemetry and metrics: Azure Portal → Application Insights
- Deployment annotations visible in timeline
- Correlate deployments with errors/performance
Troubleshooting
Common Issues
1. OIDC Authentication Failed
Error: "Unable to get OIDC id token from the GitHub endpoint"
Solutions:
- Verify AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_SUBSCRIPTION_ID secrets
- Check federated credential configuration in Azure AD
- Ensure service principal has required permissions
2. Deployment Failed - Insufficient Permissions
Error: "The client does not have authorization to perform action"
Solutions: - Verify RBAC role assignments for service principal - Check resource group permissions - Verify subscription-level permissions
2a. RBAC Role Assignment Failed (AI Infrastructure)
Error: Authorization failed for template resource of type 'Microsoft.Authorization/roleAssignments'. The client does not have permission to perform action 'Microsoft.Authorization/roleAssignments/write'
Symptom: AI Services, AI Foundry Hub, and AI Foundry Project not created
Root Cause: Service principal lacks User Access Administrator role
Solution:
# Grant User Access Administrator role at resource group scope
SUBSCRIPTION_ID=$(az account show --query id -o tsv)
APP_ID="<your-service-principal-client-id>"
RG_NAME="loan-defenders-dev-rg"
az role assignment create \
--role "User Access Administrator" \
--assignee $APP_ID \
--scope /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$RG_NAME \
--description "Allow GitHub Actions to assign RBAC roles during deployment"
Verification:
# Check both roles are assigned
az role assignment list \
--assignee $APP_ID \
--all \
--query "[].{Role:roleDefinitionName, Scope:scope}" \
-o table
# Expected output:
# Role Scope
# ------------------------- --------------------------------------------------
# Contributor /subscriptions/<sub-id>
# User Access Administrator /subscriptions/<sub-id>/resourceGroups/<rg-name>
Why this is needed: - AI Foundry deployment creates 8+ role assignments between resources - Hub needs access to AI Services, Storage, Key Vault - Project needs access to AI Services, Storage, Key Vault - Without UAA role, all AI infrastructure deployment fails and rolls back
See: ADR-027: User Access Administrator for GitHub Actions
3. Template Validation Failed
Error: "Template validation failed: ..."
Solutions:
- Validate Bicep locally: az bicep build
- Check parameter file format
- Verify required parameters are provided
4. Resource Already Exists
Error: "A resource with the ID '...' already exists"
Solutions:
- Bicep deployments are idempotent (usually safe to re-run)
- Check if previous deployment is still in progress
- Use --mode Complete carefully (can delete resources)
Best Practices
Workflow Development
- ✅ Test locally first before pushing to GitHub
- ✅ Use minimal permissions in workflow files
- ✅ Pin action versions for stability
- ✅ Add descriptive names to workflow steps
- ✅ Use environment variables for reusability
- ✅ Add deployment summaries with
GITHUB_STEP_SUMMARY - ✅ Handle errors gracefully with proper exit codes
Infrastructure Changes
- ✅ Always use pull requests for infrastructure changes
- ✅ Deploy to dev first before staging/prod
- ✅ Document ADRs for architectural changes
- ✅ Update parameter files for all environments
- ✅ Test rollback procedures in dev environment
- ✅ Monitor deployments in real-time
- ✅ Keep deployment logs for audit purposes
Security
- ✅ Never commit secrets to repository
- ✅ Use environment-specific secrets in GitHub
- ✅ Rotate service principal credentials regularly
- ✅ Review workflow permissions quarterly
- ✅ Audit deployment logs for suspicious activity
- ✅ Use approved actions only (GitHub/Microsoft)
- ✅ Enable Dependabot for action updates