ADR-047: Layer-Specific RBAC Architecture
Status: Accepted
Date: 2024-10-21
Authors: Infrastructure Team
Deciders: Architecture Team, Security Team
Context
The original RBAC implementation used a monolithic rbac.bicep module that mixed concerns from multiple deployment layers (Layer 1 foundation permissions and Layer 4 container app permissions). This approach caused critical timing issues:
Problems Identified
-
Chicken-and-Egg Problem: Container Apps were created with system-assigned managed identities, then RBAC permissions were assigned post-creation. Container Apps tried to pull images immediately upon creation, but ACR Pull permissions hadn't propagated yet (Azure RBAC can take 5-10 minutes).
-
Mixed Concerns: A single RBAC module handled:
- Layer 1: Managed Identity → AI Services (Cognitive Services User)
- Layer 1: Managed Identity → Storage Account (Storage Blob Data Contributor)
-
Layer 4: Container Apps → ACR (AcrPull)
-
Deployment Failures: Container Apps failed with "Validation of container app creation/update timed out" because they couldn't pull images from ACR during provisioning.
-
Multiple Role Assignments: Each container app (UI, API, 3 MCP servers) required a separate role assignment with system-assigned identities (5 role assignments total).
Decision
We will restructure RBAC into layer-specific modules that align with the 4-layer deployment architecture and use user-assigned managed identity for container apps.
New Architecture
infrastructure/bicep/modules/
├── rbac-layer1-foundation.bicep # AI Services, Storage permissions
├── rbac-layer4-container-apps.bicep # Container Apps identity + ACR permissions
└── [rbac.bicep REMOVED] # Monolithic module deleted
Key Changes
1. Layer 1 RBAC (Fully Integrated)
Current Implementation: RBAC is handled within Layer 1 modules
Status: ✅ FULLY INTEGRATED - All permissions assigned
What's Working:
- ✅ Managed Identity created in modules/security.bicep
- ✅ Managed Identity → AI Services (Cognitive Services User, OpenAI User) - assigned in modules/ai-services.bicep
- ✅ Managed Identity → Storage Account (Storage Blob Data Contributor) - assigned in modules/security.bicep
Note: RBAC is distributed across modules rather than centralized. See GitHub issue for future centralization.
2. Layer 4 RBAC Module (rbac-layer4-container-apps.bicep)
Purpose: Container Apps ACR access
Deployed: BEFORE container apps in Layer 4
Strategy:
1. Creates a single user-assigned managed identity for ALL container apps
2. Assigns AcrPull role to this identity
3. Returns identity ID for container apps to reference
Key Innovation: Identity and permissions exist BEFORE apps try to pull images.
3. User-Assigned vs System-Assigned Identity
Decision: Use User-Assigned Managed Identity for Container Apps
Rationale (per Azure Well-Architected Framework): - ✅ Multiple resources (5 apps) need same permissions - ✅ Pre-authorization required before resource creation - ✅ Timing critical (RBAC must exist before image pull) - ✅ Reduced role assignments (1 instead of 5) - ✅ Simplified management (single identity lifecycle)
Microsoft Guidance:
"Use user-assigned managed identities when multiple resources need the same set of permissions, to reduce the number of role assignments needed." — Azure Well-Architected Framework - Security
Deployment Flow
Layer 4 Deployment Sequence:
1. rbac-layer4-container-apps module
├── Create user-assigned identity
└── Assign AcrPull to ACR
2. Azure propagates RBAC (automatic)
3. uiContainerApp module (if enabled)
└── Uses shared identity ID
4. apiContainerApp module (if enabled)
└── Uses shared identity ID
5. mcpServerContainerApps module (if enabled)
├── verification (uses shared identity)
├── documents (uses shared identity)
└── financial (uses shared identity)
Container App Module Pattern
All container app modules now support both identity types (backward compatible):
@description('User-assigned managed identity resource ID for ACR access')
param userAssignedIdentityId string = ''
var useUserAssignedIdentity = !empty(userAssignedIdentityId)
resource containerApp 'Microsoft.App/containerApps@2024-03-01' = {
identity: useUserAssignedIdentity ? {
type: 'UserAssigned'
userAssignedIdentities: {
'${userAssignedIdentityId}': {}
}
} : {
type: 'SystemAssigned' // Fallback
}
properties: {
configuration: {
registries: [
{
server: acrLoginServer
identity: useUserAssignedIdentity ? userAssignedIdentityId : 'system'
}
]
}
}
}
Azure Built-in Role IDs
Decision: Continue using hardcoded Azure built-in role GUIDs.
Rationale: - Azure built-in role IDs are universal constants maintained by Microsoft - Identical across ALL subscriptions, tenants, regions, and clouds - Documented at: https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles - Using role names requires dynamic lookup (unnecessary overhead)
Used Role IDs:
var acrPullRoleId = '7f951dda-4ed3-4680-a7ca-43fe172d538d' // AcrPull
var cognitiveServicesUserRoleId = 'a97b65f3-24c7-4388-baec-2e87135dc908' // Cognitive Services User
var storageBlobDataContributorRoleId = 'ba92f5b4-2d11-453d-a403-e96b0029c9fe' // Storage Blob Data Contributor
Note: Custom roles (if created) would require dynamic lookup as their IDs are subscription-specific.
Consequences
Positive
- ✅ No More Timing Issues: Identity and permissions exist before container apps try to pull images
- ✅ Cleaner Architecture: Each layer manages its own RBAC concerns
- ✅ Reduced Complexity: 1 identity instead of 5, 1 role assignment instead of 5
- ✅ Faster Deployments: No waiting for system-assigned identity creation + RBAC propagation
- ✅ Better Separation of Concerns: Layer-specific modules align with deployment boundaries
- ✅ Production Ready: Follows Azure best practices and Well-Architected Framework
- ✅ Backward Compatible: Modules still support system-assigned identity as fallback
- ✅ Simplified Auditing: Single identity to audit for all container app ACR access
Neutral
- ⚠️ Additional Resource: Creates one user-assigned managed identity (minimal cost)
- ⚠️ Module Count: Two RBAC modules instead of one (but better organized)
- ⚠️ Learning Curve: Team needs to understand user-assigned vs system-assigned identity patterns
Negative
- ❌ Breaking Change: Existing deployments with system-assigned identities need migration
- ❌ Manual Cleanup: Old system-assigned identity role assignments may need manual removal
Migration Path
For existing deployments:
Option A: Clean Redeployment (Recommended)
# Delete existing container apps
az containerapp delete --name ldfdev-api-dev --resource-group ldfdev-rg --yes
az containerapp delete --name ldfdev-mcp-verification --resource-group ldfdev-rg --yes
az containerapp delete --name ldfdev-mcp-documents --resource-group ldfdev-rg --yes
az containerapp delete --name ldfdev-mcp-financial --resource-group ldfdev-rg --yes
# Redeploy with new architecture
./infrastructure/scripts/deploy-layer4.sh dev
Option B: In-Place Update
# Bicep will update identity configuration
./infrastructure/scripts/deploy-layer4.sh dev
# Manually clean up old role assignments
az role assignment list --scope <acr-id> -o table
az role assignment delete --ids <old-system-identity-assignment-ids>
Compliance
Azure Well-Architected Framework
| Pillar | Alignment |
|---|---|
| Security | ✅ Follows managed identity best practices |
| Operational Excellence | ✅ Simplified management, reduced role assignments |
| Performance Efficiency | ✅ Faster deployments, no RBAC propagation delays |
| Cost Optimization | ✅ Reduced role assignment overhead |
| Reliability | ✅ Eliminates timing-related deployment failures |
Microsoft Best Practices
| Practice | Status |
|---|---|
| Use managed identities over service principals | ✅ Implemented |
| Use user-assigned for multiple resources | ✅ Implemented |
| Minimize role assignments | ✅ Implemented |
| Assign least privilege | ✅ Implemented (AcrPull only) |
| Layer-specific security boundaries | ✅ Implemented |
Implementation
Files Modified
| File | Status | Purpose |
|---|---|---|
modules/rbac.bicep |
❌ DELETED | Monolithic module removed |
modules/rbac-layer1-foundation.bicep |
✅ NEW | Foundation permissions |
modules/rbac-layer4-container-apps.bicep |
✅ NEW | Container apps identity + ACR |
layer4-apps.bicep |
✅ Modified | Uses new RBAC module |
modules/container-app-ui.bicep |
✅ Modified | User-assigned identity support |
modules/container-app-api.bicep |
✅ Modified | User-assigned identity support |
modules/container-app-mcp-server.bicep |
✅ Modified | User-assigned identity support |
modules/container-apps-mcp-servers.bicep |
✅ Modified | Passes identity to children |
Deployment Changes
// Before: Monolithic RBAC after deployment
module rbac 'modules/rbac.bicep' = {
params: {
managedIdentityPrincipalId: managedIdentityId
aiServicesId: aiServicesId
storageAccountId: storageAccountId
acrId: acrId
mcpServerPrincipalIds: {
verification: mcpVerification.outputs.principalId // System-assigned
documents: mcpDocuments.outputs.principalId
financial: mcpFinancial.outputs.principalId
}
}
dependsOn: [
apiContainerApp // ← Apps created FIRST
mcpContainerApps // ← Then RBAC assigned (too late!)
]
}
// After: Layer-specific RBAC before deployment
module containerAppsRbac 'modules/rbac-layer4-container-apps.bicep' = {
params: {
deploymentPrefix: deploymentPrefix
location: location
acrName: '${deploymentPrefix}acr'
}
} // ← Identity + RBAC created FIRST
module apiContainerApp 'modules/container-app-api.bicep' = {
params: {
userAssignedIdentityId: containerAppsRbac.outputs.identityId // ← Use pre-configured identity
}
dependsOn: [
containerAppsRbac // ← Wait for RBAC to be ready
]
}
Alternatives Considered
Alternative 1: Keep Monolithic RBAC, Use ACR Admin Credentials
Approach: Use ACR admin username/password instead of managed identity.
Rejected Because: - ❌ Less secure (credentials in Key Vault) - ❌ Requires credential rotation - ❌ Not aligned with Azure best practices - ❌ Doesn't address timing issue fundamentally
Alternative 2: Add Retry Logic and Delays
Approach: Add sleep/retry logic in deployment scripts to wait for RBAC propagation.
Rejected Because: - ❌ Unreliable (RBAC propagation time varies) - ❌ Slower deployments (waiting 5-10 minutes unnecessarily) - ❌ Doesn't fix root cause - ❌ Creates fragile deployment process
Alternative 3: Pre-create All System-Assigned Identities
Approach: Create container apps with minimal configuration first, assign RBAC, then update.
Rejected Because: - ❌ Two-phase deployment (complex) - ❌ Still requires RBAC propagation wait - ❌ More role assignments to manage (5 instead of 1) - ❌ System-assigned identities deleted if app deleted
References
Microsoft Documentation
- Azure Container Apps - Managed Identity
- Azure Built-in Roles
- Azure Well-Architected Framework - Security
- RBAC Best Practices
- User-Assigned Managed Identities
Related ADRs
- ADR-032: Key Vault Removal (simplified infrastructure)
- ADR-043: ACR Task Builds and Deployment Validation
- ADR-045: Intelligent Container Image Validation
Internal Documentation
/temp/RBAC-RESTRUCTURING.md- Detailed technical implementation/temp/QUESTIONS-ANSWERED.md- FAQ on design decisions/temp/DEPLOYMENT-STATUS.md- Current deployment status/docs/architecture/security.md- Security architecture overview/docs/deployment/rbac-setup.md- RBAC deployment guide
Decision Tracking
Decided: 2024-10-21
Implemented: 2024-10-21
Verified: Pending (next deployment)
Success Metrics
- Container apps deploy without "validation timeout" errors
- User-assigned identity created before container apps
- ACR Pull role assigned before image pull attempts
- All container apps successfully pull images on first try
- Zero manual RBAC assignments needed post-deployment
- Deployment time reduced by 5-10 minutes (no RBAC wait)
Review Criteria
This ADR should be reviewed if: 1. Azure changes managed identity best practices 2. Container Apps ACR integration changes 3. We need to support cross-tenant deployments 4. Custom roles are introduced requiring dynamic lookup 5. Additional platform services require ACR access
Supersedes: None (new pattern)
Superseded by: None
Status: ✅ Accepted and Implemented