Skip to content

RBAC Setup Guide

Overview

This guide explains how Role-Based Access Control (RBAC) is configured and deployed in the Loan Defenders platform. The RBAC architecture uses layer-specific modules aligned with the 4-layer deployment strategy.

Key Innovation: User-assigned managed identity created BEFORE container apps to eliminate RBAC timing issues.

Quick Start

# Deploy Layer 4 with automatic RBAC setup
./infrastructure/scripts/deploy-layer4.sh dev

# RBAC is automatically configured:
# 1. User-assigned identity created
# 2. ACR Pull permissions assigned
# 3. Container apps deployed with shared identity
# ✅ Zero manual RBAC configuration needed!

Architecture Overview

Layer-Specific RBAC

Foundation Layer: Core Security & Identity
├── Managed Identity (User-Assigned)
└── VNet + Subnets with Service Endpoints
    └── ACI Subnet: Microsoft.CognitiveServices + Microsoft.Storage

Substrate Layer: AI Services RBAC (Native Projects Architecture)
├── Managed Identity → AI Services Account (Cognitive Services OpenAI User)
├── Managed Identity → AI Services Account (Cognitive Services User)
├── Managed Identity → AI Services Account (Azure AI User) ⭐ NEW - Data plane access
├── Managed Identity → AI Services Project (Cognitive Services OpenAI Contributor)
├── Managed Identity → AI Services Project (Azure AI User) ⭐ NEW - Project data plane
├── Managed Identity → Storage (Blob Data Contributor)
├── Managed Identity → ACR (AcrPull)
├── AI Services Account → Storage (Blob Data Contributor)
└── AI Services Project → Storage (Blob Data Contributor)

Note: ML Services Hub/Project architecture NOT USED (switched to Native AI Services Projects)

AI Models Layer
└── Uses Foundation managed identity

Apps Layer: Container Deployment
└── ACI Container Group uses Foundation managed identity

Why This Works: - Identity + permissions exist BEFORE apps try to pull images - No RBAC propagation delays (5-10 min wait eliminated) - One identity for all container apps (simplified management)

RBAC Modules

Module 1: Layer 1 RBAC (Fully Integrated)

Purpose: Foundation permissions (AI Services, Storage)

Current Status: ✅ FULLY INTEGRATED

What's Working: - ✅ Managed Identity created (modules/security.bicep) - ✅ AI Services permissions assigned (modules/ai-services.bicep): - Cognitive Services User - Cognitive Services OpenAI User - ✅ Storage permissions assigned (modules/security.bicep): - Storage Blob Data Contributor

Implementation Note: RBAC is distributed across modules (ai-services and security) rather than centralized in a dedicated RBAC module.

Future Improvement: Centralize all Layer 1 RBAC into rbac-layer1-foundation.bicep module (see GitHub issue)

Module 2: rbac-layer4-container-apps.bicep

Purpose: Container Apps ACR access

What It Does:

1. Creates user-assigned identity: {prefix}-container-apps-identity
2. Assigns AcrPull role to ACR
3. Returns identity ID for container apps

Status: ✅ Fully integrated in Layer 4

Used By: - UI Container App - API Container App
- MCP Verification Server - MCP Documents Server - MCP Financial Server

Deployment Flow

./infrastructure/scripts/deploy-layer4.sh dev
│
├─> 1. Validate prerequisites
├─> 2. Check container images
├─> 3. Deploy RBAC module (rbac-layer4-container-apps)
│    ├─> Create user-assigned identity
│    └─> Assign AcrPull to ACR
│
├─> 4. Azure propagates RBAC (automatic)
│
├─> 5. Deploy UI (if enabled)
│    └─> Uses shared identity
│
├─> 6. Deploy API (if enabled)
│    └─> Uses shared identity
│
└─> 7. Deploy MCP servers (if enabled)
     ├─> Verification (shared identity)
     ├─> Documents (shared identity)
     └─> Financial (shared identity)

✅ All apps can pull images - permissions already exist!

Azure Built-in Roles

Roles Used

Role GUID Purpose Scope
AcrPull 7f951dda-4ed3-4680-a7ca-43fe172d538d Container images pull ACR
Cognitive Services User a97b65f3-24c7-4388-baec-2e87135dc908 Read AI Services keys, list models AI Services Account
Cognitive Services OpenAI User 5e0bd9bd-7b93-4f28-af87-19fc36ad61bd Inference, view models/deployments AI Services Account
Cognitive Services OpenAI Contributor a001fd3d-188f-4b5d-821b-7da978bf7442 Deploy/manage models and projects AI Services Project
Azure AI User ⭐ 53ca6127-db72-4b80-b1b0-d745d6d5456d Data actions (execute agents, invoke models) AI Services Account + Project
Storage Blob Data Contributor ba92f5b4-2d11-453d-a403-e96b0029c9fe Read/write blobs Storage Account

âš ī¸ Critical Role Distinction: Contributor vs. User

Cognitive Services OpenAI Contributor: - ✅ Management plane: Deploy models, manage projects, configure resources - ❌ NO data plane access: Cannot execute agents or invoke models - Use case: Infrastructure automation, model deployment - Scope: AI Services Project

Azure AI User (Required for runtime operations): - ❌ NO management access - ✅ Data plane: Execute agents, invoke models, read/write project data - Use case: Application runtime (this is what Microsoft Agent Framework needs!) - Scope: AI Services Account + AI Services Project

Both roles are required for managed identity - Contributor for model management, User for agent execution.

🔄 Architecture Change Note (2025-10-30)

Previous Architecture (ML Services Hub/Project): - Used: Azure AI Administrator + Azure AI User roles - Scope: Microsoft.MachineLearningServices/workspaces (Hub + Project) - Endpoint: workspace.{region}.api.azureml.ms

Current Architecture (Native AI Services Projects): - Uses: Cognitive Services OpenAI Contributor + Azure AI User roles - Scope: Microsoft.CognitiveServices/accounts + Microsoft.CognitiveServices/accounts/projects - Endpoint: {ai-services}.services.ai.azure.com/api/projects/{project}

Migration: ML Services Hub/Project roles are NOT needed with native projects architecture.

Note: These GUIDs are Azure universal constants - same across all subscriptions, tenants, and clouds.

Reference: - Azure Built-in Roles - Azure AI Foundry RBAC

Verification

Check Identity Created

RG="ldfdev8-rg"  # Your resource group
IDENTITY_NAME="${RG%-rg}-identity"  # e.g., ldfdev8-identity

az identity show -n "$IDENTITY_NAME" -g "$RG" \
  --query "{Name:name, PrincipalId:principalId, ClientId:clientId}" \
  -o table

# Expected:
# Name              PrincipalId                          ClientId
# ----------------  -----------------------------------  ------------------------------------
# ldfdev8-identity  <principal-id-guid>                  <client-id-guid>

Check ALL Role Assignments (Complete)

IDENTITY_PRINCIPAL=$(az identity show -n "$IDENTITY_NAME" -g "$RG" --query principalId -o tsv)

az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --all \
  --query "[].{Role:roleDefinitionName, Scope:split(scope, '/')[8]}" \
  -o table

# Expected output (8 roles total - Native AI Services Projects):
# Role                                    Scope
# --------------------------------------  ------------------------
# AcrPull                                 <acr-name>
# Cognitive Services User                 <ai-services-account>
# Cognitive Services OpenAI User          <ai-services-account>
# Azure AI User                           <ai-services-account> ⭐
# Cognitive Services OpenAI Contributor   <ai-services-project>
# Azure AI User                           <ai-services-project> ⭐
# Storage Blob Data Contributor           <storage-account>
# Storage Blob Data Contributor           <storage-account> (from AI Services)

Verify Native AI Services Project Roles (Critical for Agent Operations)

# Check specifically for Azure AI roles
az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --all \
  --query "[?contains(roleDefinitionName, 'Azure AI')].{Role:roleDefinitionName, Scope:scope}" \
  -o table

# âš ī¸ MUST show Azure AI User for AI Services Account AND Project:
# Role                      Scope
# ------------------------  ------------------------------------------------------
# Azure AI User             /subscriptions/.../ai-services-account ⭐
# Azure AI User             /subscriptions/.../ai-services-project ⭐

# ❌ If missing Azure AI User → Microsoft Agent Framework will fail with "Forbidden"
# â„šī¸ Note: Azure AI Administrator NOT needed (ML Services Hub/Project only)

Verify Service Endpoints on ACI Subnet

VNET_NAME="${RG%-rg}-vnet"

az network vnet subnet show \
  --resource-group "$RG" \
  --vnet-name "$VNET_NAME" \
  --name "aci-subnet" \
  --query "serviceEndpoints[].service" \
  -o tsv

# Expected output:
# Microsoft.CognitiveServices
# Microsoft.Storage
# Microsoft.KeyVault
# Microsoft.ContainerRegistry

# ❌ If missing Microsoft.CognitiveServices → "context manager not available" error

Check Container App Identity

az containerapp show -n ldfdev-api-dev -g ldfdev-rg \
  --query "{Type:identity.type, IdentityCount:length(identity.userAssignedIdentities)}" \
  -o table

# Expected:
# Type           IdentityCount
# -------------  --------------
# UserAssigned   1

Verify Image Pull Success

az containerapp revision list -n ldfdev-api-dev -g ldfdev-rg \
  --query "[0].{Name:name, Health:properties.healthState}" \
  -o table

# Expected:
# Name                     Health
# -----------------------  --------
# ldfdev-api-dev--<rev>    Healthy

If "Healthy" → Image pull successful! ✅

Troubleshooting

Issue: "Forbidden" or "Access Denied" from Azure AI Foundry

Symptom: - Microsoft Agent Framework client initialization fails - Error: "Forbidden", "Access Denied", or "Insufficient permissions" - Container logs show authentication errors

Root Cause: Missing "Azure AI User" role on AI Services Account or Project

Diagnosis:

RG="ldfdev8-rg"
IDENTITY_PRINCIPAL=$(az identity show -n "${RG%-rg}-identity" -g "$RG" --query principalId -o tsv)

# Check for Azure AI User role
az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --query "[?roleDefinitionName=='Azure AI User'].{Role:roleDefinitionName, Scope:scope}" \
  -o table

# If output is EMPTY → This is the problem!

Fix:

# Redeploy Substrate layer (adds Azure AI User role automatically)
./infrastructure/scripts/deploy-substrate.sh dev "$RG"

# Or manually assign Azure AI User if needed (Native AI Services Projects):
AI_SERVICES_ID=$(az cognitiveservices account show -n <ai-services-name> -g "$RG" --query id -o tsv)
AI_PROJECT_ID=$(az cognitiveservices account project show --account-name <ai-services-name> -g "$RG" --project-name <project-name> --query id -o tsv)

# Azure AI User role (53ca6127-db72-4b80-b1b0-d745d6d5456d)
az role assignment create \
  --assignee "$IDENTITY_PRINCIPAL" \
  --role "53ca6127-db72-4b80-b1b0-d745d6d5456d" \
  --scope "$AI_SERVICES_ID"

az role assignment create \
  --assignee "$IDENTITY_PRINCIPAL" \
  --role "53ca6127-db72-4b80-b1b0-d745d6d5456d" \
  --scope "$AI_PROJECT_ID"

# â„šī¸ Note: Azure AI Administrator NOT needed (ML Services Hub/Project only)

Issue: "Context Manager Not Available" Error

Symptom: - Container logs show: "context manager not available" - Network connectivity failures to AI Services - AzureAIAgentClient cannot connect

Root Cause: Missing service endpoint on ACI subnet

Diagnosis:

RG="ldfdev8-rg"
VNET_NAME="${RG%-rg}-vnet"

# Check if service endpoint exists
az network vnet subnet show \
  --resource-group "$RG" \
  --vnet-name "$VNET_NAME" \
  --name "aci-subnet" \
  --query "serviceEndpoints[?service=='Microsoft.CognitiveServices']" \
  -o table

# If output is EMPTY → This is the problem!

Fix:

# Service endpoints are configured in Foundation layer
# Redeploy Foundation to add them:
./infrastructure/scripts/deploy-foundation.sh dev "$RG"

# Note: Service endpoints are already defined in networking.bicep lines 660-665
# If they're missing, it means Foundation wasn't deployed with latest code

Issue: "Unauthorized" Error (ACR Image Pull)

Symptom: failed to pull image: unauthorized

Fix:

RG="ldfdev8-rg"
IDENTITY_NAME="${RG%-rg}-identity"

# 1. Verify identity exists
az identity show -n "$IDENTITY_NAME" -g "$RG"

# 2. Check role assignment
IDENTITY=$(az identity show -n "$IDENTITY_NAME" -g "$RG" --query principalId -o tsv)
az role assignment list --assignee $IDENTITY --query "[?roleDefinitionName=='AcrPull']" -o table

# 3. If missing, manually assign
ACR_NAME="${RG%-rg}acr"
ACR_ID=$(az acr show -n "$ACR_NAME" -g "$RG" --query id -o tsv)

az role assignment create \
  --assignee $IDENTITY \
  --role "AcrPull" \
  --scope "$ACR_ID"

# 4. Redeploy apps
./infrastructure/scripts/deploy-apps.sh dev "$RG"

Issue: "Validation Timeout"

Symptom: Validation of container app creation/update timed out

Cause: RBAC module not deployed first or dependsOn missing

Fix:

# Redeploy Layer 4 (ensures correct order)
./infrastructure/scripts/deploy-layer4.sh dev

Issue: System-Assigned Identity Still Used

Symptom: identity.type = "SystemAssigned"

Fix:

# Delete and recreate (cleanest)
az containerapp delete --name ldfdev-api-dev -g ldfdev-rg --yes
./infrastructure/scripts/deploy-layer4.sh dev --api

# Verify
az containerapp show -n ldfdev-api-dev -g ldfdev-rg --query "identity.type" -o tsv
# Should show: UserAssigned

Migration from Old RBAC

If You Have Existing Deployments

Option A: Clean Slate (Recommended)

# Delete all container apps
for app in ldfdev-api-dev ldfdev-mcp-verification ldfdev-mcp-documents ldfdev-mcp-financial; do
  az containerapp delete --name $app -g ldfdev-rg --yes
done

# Redeploy with new architecture
./infrastructure/scripts/deploy-layer4.sh dev

Option B: In-Place Update

# Redeploy (Bicep will update identity)
./infrastructure/scripts/deploy-layer4.sh dev

# Clean up old role assignments if any
az role assignment list --scope <acr-id> -o table
az role assignment delete --ids <old-assignment-id>

Best Practices

✅ Do

  • Use user-assigned identity for shared permissions
  • Deploy RBAC before resources that need it
  • Use resource-level scope (not subscription)
  • Verify RBAC after deployment
  • Document identity purpose in tags

❌ Don't

  • Don't use system-assigned for multiple resources
  • Don't assign at subscription level unnecessarily
  • Don't skip dependsOn in Bicep
  • Don't use service principals (use managed identities)
  • Don't hardcode subscription IDs

Security Notes

Least Privilege

  • ACR Pull only allows image pull (not push/delete)
  • Scoped to specific ACR resource
  • Identity purpose-limited to container apps

Audit

# View all role assignments
az role assignment list --assignee <principal-id> -o table

# View assignment history
az monitor activity-log list \
  --resource-group ldfdev-rg \
  --offset 7d \
  --query "[?contains(operationName.value, 'roleAssignments')]" \
  -o table

Regular Reviews

  • Review quarterly
  • Remove unused identities
  • Validate scope appropriateness
  • Check for over-privileged assignments

References

Documentation

  • infrastructure/bicep/modules/rbac-layer1-foundation.bicep - Foundation RBAC
  • infrastructure/bicep/modules/rbac-layer4-container-apps.bicep - Container Apps RBAC
  • infrastructure/scripts/deploy-layer4.sh - Deployment script

Summary: Complete RBAC Role Matrix (Native AI Services Projects)

Managed Identity Roles (8 total)

Role Target Resource Purpose
Cognitive Services OpenAI User AI Services Account Model inference access
Cognitive Services User AI Services Account General service access
Azure AI User ⭐ AI Services Account Data plane operations
Cognitive Services OpenAI Contributor AI Services Project Project/model management
Azure AI User ⭐ AI Services Project Project data plane
Storage Blob Data Contributor Storage Account Read/write artifacts
AcrPull Azure Container Registry Pull container images

System Identity Roles (2 additional)

System Identity Role Target Purpose
AI Services Account Storage Blob Data Contributor Storage Account Service artifacts
AI Services Project Storage Blob Data Contributor Storage Account Project artifacts

Total RBAC Assignments: 10 (8 for Managed Identity + 2 for System Identities)

🔄 Architecture Evolution

Before (ldfdev1-13): ML Services Hub/Project - Azure AI Administrator + Azure AI User on Hub - Azure AI Administrator + Azure AI User on Project - Total: 10 RBAC assignments

After (ldfdev14+): Native AI Services Projects - Cognitive Services roles on AI Services Account - Azure AI User on both Account + Project - Total: 10 RBAC assignments (simplified, more reliable)


Last Updated: 2025-10-30 Status: Production Ready ✅ Architecture: Native AI Services Projects (Option A) Owner: Infrastructure Team