RBAC Setup Guide

Overview

This guide explains how Role-Based Access Control (RBAC) is configured and deployed in the Loan Defenders platform. The RBAC architecture uses layer-specific modules aligned with the 4-layer deployment strategy.

Key Innovation: User-assigned managed identity created BEFORE container apps to eliminate RBAC timing issues.

Quick Start

# Deploy Layer 4 with automatic RBAC setup
./infrastructure/scripts/deploy-layer4.sh dev

# RBAC is automatically configured:
# 1. User-assigned identity created
# 2. ACR Pull permissions assigned
# 3. Container apps deployed with shared identity
# ✅ Zero manual RBAC configuration needed!

Architecture Overview

Layer-Specific RBAC

Foundation Layer: Core Security & Identity
├── Managed Identity (User-Assigned)
└── VNet + Subnets with Service Endpoints
    └── ACI Subnet: Microsoft.CognitiveServices + Microsoft.Storage

Substrate Layer: AI Services RBAC (Native Projects Architecture)
├── Managed Identity → AI Services Account (Cognitive Services OpenAI User)
├── Managed Identity → AI Services Account (Cognitive Services User)
├── Managed Identity → AI Services Account (Azure AI User) ⭐ NEW - Data plane access
├── Managed Identity → AI Services Project (Cognitive Services OpenAI Contributor)
├── Managed Identity → AI Services Project (Azure AI User) ⭐ NEW - Project data plane
├── Managed Identity → Storage (Blob Data Contributor)
├── Managed Identity → ACR (AcrPull)
├── AI Services Account → Storage (Blob Data Contributor)
└── AI Services Project → Storage (Blob Data Contributor)

Note: ML Services Hub/Project architecture NOT USED (switched to Native AI Services Projects)

AI Models Layer
└── Uses Foundation managed identity

Apps Layer: Container Deployment
└── ACI Container Group uses Foundation managed identity

Why This Works: - Identity + permissions exist BEFORE apps try to pull images - No RBAC propagation delays (5-10 min wait eliminated) - One identity for all container apps (simplified management)

RBAC Modules

Module 1: Layer 1 RBAC (Fully Integrated)

Purpose: Foundation permissions (AI Services, Storage)

Current Status: ✅ FULLY INTEGRATED

What's Working: - ✅ Managed Identity created (modules/security.bicep) - ✅ AI Services permissions assigned (modules/ai-services.bicep): - Cognitive Services User - Cognitive Services OpenAI User - ✅ Storage permissions assigned (modules/security.bicep): - Storage Blob Data Contributor

Implementation Note: RBAC is distributed across modules (ai-services and security) rather than centralized in a dedicated RBAC module.

Future Improvement: Centralize all Layer 1 RBAC into rbac-layer1-foundation.bicep module (see GitHub issue)

Module 2: `rbac-layer4-container-apps.bicep`

Purpose: Container Apps ACR access

What It Does:

1. Creates user-assigned identity: {prefix}-container-apps-identity
2. Assigns AcrPull role to ACR
3. Returns identity ID for container apps

Status: ✅ Fully integrated in Layer 4

Used By: - UI Container App - API Container App
- MCP Verification Server - MCP Documents Server - MCP Financial Server

Deployment Flow

./infrastructure/scripts/deploy-layer4.sh dev
│
├─> 1. Validate prerequisites
├─> 2. Check container images
├─> 3. Deploy RBAC module (rbac-layer4-container-apps)
│    ├─> Create user-assigned identity
│    └─> Assign AcrPull to ACR
│
├─> 4. Azure propagates RBAC (automatic)
│
├─> 5. Deploy UI (if enabled)
│    └─> Uses shared identity
│
├─> 6. Deploy API (if enabled)
│    └─> Uses shared identity
│
└─> 7. Deploy MCP servers (if enabled)
     ├─> Verification (shared identity)
     ├─> Documents (shared identity)
     └─> Financial (shared identity)

✅ All apps can pull images - permissions already exist!

Azure Built-in Roles

Roles Used

Role	GUID	Purpose	Scope
AcrPull	`7f951dda-4ed3-4680-a7ca-43fe172d538d`	Container images pull	ACR
Cognitive Services User	`a97b65f3-24c7-4388-baec-2e87135dc908`	Read AI Services keys, list models	AI Services Account
Cognitive Services OpenAI User	`5e0bd9bd-7b93-4f28-af87-19fc36ad61bd`	Inference, view models/deployments	AI Services Account
Cognitive Services OpenAI Contributor	`a001fd3d-188f-4b5d-821b-7da978bf7442`	Deploy/manage models and projects	AI Services Project
Azure AI User ⭐	`53ca6127-db72-4b80-b1b0-d745d6d5456d`	Data actions (execute agents, invoke models)	AI Services Account + Project
Storage Blob Data Contributor	`ba92f5b4-2d11-453d-a403-e96b0029c9fe`	Read/write blobs	Storage Account

⚠️ Critical Role Distinction: Contributor vs. User

Cognitive Services OpenAI Contributor: - ✅ Management plane: Deploy models, manage projects, configure resources - ❌ NO data plane access: Cannot execute agents or invoke models - Use case: Infrastructure automation, model deployment - Scope: AI Services Project

Azure AI User (Required for runtime operations): - ❌ NO management access - ✅ Data plane: Execute agents, invoke models, read/write project data - Use case: Application runtime (this is what Microsoft Agent Framework needs!) - Scope: AI Services Account + AI Services Project

Both roles are required for managed identity - Contributor for model management, User for agent execution.

🔄 Architecture Change Note (2025-10-30)

Previous Architecture (ML Services Hub/Project): - Used: Azure AI Administrator + Azure AI User roles - Scope: Microsoft.MachineLearningServices/workspaces (Hub + Project) - Endpoint: workspace.{region}.api.azureml.ms

Current Architecture (Native AI Services Projects): - Uses: Cognitive Services OpenAI Contributor + Azure AI User roles - Scope: Microsoft.CognitiveServices/accounts + Microsoft.CognitiveServices/accounts/projects - Endpoint: {ai-services}.services.ai.azure.com/api/projects/{project}

Migration: ML Services Hub/Project roles are NOT needed with native projects architecture.

Note: These GUIDs are Azure universal constants - same across all subscriptions, tenants, and clouds.

Reference: - Azure Built-in Roles - Azure AI Foundry RBAC

Verification

Check Identity Created

RG="ldfdev8-rg"  # Your resource group
IDENTITY_NAME="${RG%-rg}-identity"  # e.g., ldfdev8-identity

az identity show -n "$IDENTITY_NAME" -g "$RG" \
  --query "{Name:name, PrincipalId:principalId, ClientId:clientId}" \
  -o table

# Expected:
# Name              PrincipalId                          ClientId
# ----------------  -----------------------------------  ------------------------------------
# ldfdev8-identity  <principal-id-guid>                  <client-id-guid>

Check ALL Role Assignments (Complete)

IDENTITY_PRINCIPAL=$(az identity show -n "$IDENTITY_NAME" -g "$RG" --query principalId -o tsv)

az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --all \
  --query "[].{Role:roleDefinitionName, Scope:split(scope, '/')[8]}" \
  -o table

# Expected output (8 roles total - Native AI Services Projects):
# Role                                    Scope
# --------------------------------------  ------------------------
# AcrPull                                 <acr-name>
# Cognitive Services User                 <ai-services-account>
# Cognitive Services OpenAI User          <ai-services-account>
# Azure AI User                           <ai-services-account> ⭐
# Cognitive Services OpenAI Contributor   <ai-services-project>
# Azure AI User                           <ai-services-project> ⭐
# Storage Blob Data Contributor           <storage-account>
# Storage Blob Data Contributor           <storage-account> (from AI Services)

Verify Native AI Services Project Roles (Critical for Agent Operations)

# Check specifically for Azure AI roles
az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --all \
  --query "[?contains(roleDefinitionName, 'Azure AI')].{Role:roleDefinitionName, Scope:scope}" \
  -o table

# ⚠️ MUST show Azure AI User for AI Services Account AND Project:
# Role                      Scope
# ------------------------  ------------------------------------------------------
# Azure AI User             /subscriptions/.../ai-services-account ⭐
# Azure AI User             /subscriptions/.../ai-services-project ⭐

# ❌ If missing Azure AI User → Microsoft Agent Framework will fail with "Forbidden"
# ℹ️ Note: Azure AI Administrator NOT needed (ML Services Hub/Project only)

Verify Service Endpoints on ACI Subnet

VNET_NAME="${RG%-rg}-vnet"

az network vnet subnet show \
  --resource-group "$RG" \
  --vnet-name "$VNET_NAME" \
  --name "aci-subnet" \
  --query "serviceEndpoints[].service" \
  -o tsv

# Expected output:
# Microsoft.CognitiveServices
# Microsoft.Storage
# Microsoft.KeyVault
# Microsoft.ContainerRegistry

# ❌ If missing Microsoft.CognitiveServices → "context manager not available" error

Check Container App Identity

az containerapp show -n ldfdev-api-dev -g ldfdev-rg \
  --query "{Type:identity.type, IdentityCount:length(identity.userAssignedIdentities)}" \
  -o table

# Expected:
# Type           IdentityCount
# -------------  --------------
# UserAssigned   1

Verify Image Pull Success

az containerapp revision list -n ldfdev-api-dev -g ldfdev-rg \
  --query "[0].{Name:name, Health:properties.healthState}" \
  -o table

# Expected:
# Name                     Health
# -----------------------  --------
# ldfdev-api-dev--<rev>    Healthy

If "Healthy" → Image pull successful! ✅

Troubleshooting

Issue: "Forbidden" or "Access Denied" from Azure AI Foundry

Symptom: - Microsoft Agent Framework client initialization fails - Error: "Forbidden", "Access Denied", or "Insufficient permissions" - Container logs show authentication errors

Root Cause: Missing "Azure AI User" role on AI Services Account or Project

Diagnosis:

RG="ldfdev8-rg"
IDENTITY_PRINCIPAL=$(az identity show -n "${RG%-rg}-identity" -g "$RG" --query principalId -o tsv)

# Check for Azure AI User role
az role assignment list \
  --assignee "$IDENTITY_PRINCIPAL" \
  --query "[?roleDefinitionName=='Azure AI User'].{Role:roleDefinitionName, Scope:scope}" \
  -o table

# If output is EMPTY → This is the problem!

Fix:

# Redeploy Substrate layer (adds Azure AI User role automatically)
./infrastructure/scripts/deploy-substrate.sh dev "$RG"

# Or manually assign Azure AI User if needed (Native AI Services Projects):
AI_SERVICES_ID=$(az cognitiveservices account show -n <ai-services-name> -g "$RG" --query id -o tsv)
AI_PROJECT_ID=$(az cognitiveservices account project show --account-name <ai-services-name> -g "$RG" --project-name <project-name> --query id -o tsv)

# Azure AI User role (53ca6127-db72-4b80-b1b0-d745d6d5456d)
az role assignment create \
  --assignee "$IDENTITY_PRINCIPAL" \
  --role "53ca6127-db72-4b80-b1b0-d745d6d5456d" \
  --scope "$AI_SERVICES_ID"

az role assignment create \
  --assignee "$IDENTITY_PRINCIPAL" \
  --role "53ca6127-db72-4b80-b1b0-d745d6d5456d" \
  --scope "$AI_PROJECT_ID"

# ℹ️ Note: Azure AI Administrator NOT needed (ML Services Hub/Project only)

Issue: "Context Manager Not Available" Error

Symptom: - Container logs show: "context manager not available" - Network connectivity failures to AI Services - AzureAIAgentClient cannot connect

Root Cause: Missing service endpoint on ACI subnet

Diagnosis:

RG="ldfdev8-rg"
VNET_NAME="${RG%-rg}-vnet"

# Check if service endpoint exists
az network vnet subnet show \
  --resource-group "$RG" \
  --vnet-name "$VNET_NAME" \
  --name "aci-subnet" \
  --query "serviceEndpoints[?service=='Microsoft.CognitiveServices']" \
  -o table

# If output is EMPTY → This is the problem!

Fix:

# Service endpoints are configured in Foundation layer
# Redeploy Foundation to add them:
./infrastructure/scripts/deploy-foundation.sh dev "$RG"

# Note: Service endpoints are already defined in networking.bicep lines 660-665
# If they're missing, it means Foundation wasn't deployed with latest code

Issue: "Unauthorized" Error (ACR Image Pull)

Symptom: failed to pull image: unauthorized

Fix:

RG="ldfdev8-rg"
IDENTITY_NAME="${RG%-rg}-identity"

# 1. Verify identity exists
az identity show -n "$IDENTITY_NAME" -g "$RG"

# 2. Check role assignment
IDENTITY=$(az identity show -n "$IDENTITY_NAME" -g "$RG" --query principalId -o tsv)
az role assignment list --assignee $IDENTITY --query "[?roleDefinitionName=='AcrPull']" -o table

# 3. If missing, manually assign
ACR_NAME="${RG%-rg}acr"
ACR_ID=$(az acr show -n "$ACR_NAME" -g "$RG" --query id -o tsv)

az role assignment create \
  --assignee $IDENTITY \
  --role "AcrPull" \
  --scope "$ACR_ID"

# 4. Redeploy apps
./infrastructure/scripts/deploy-apps.sh dev "$RG"

Issue: "Validation Timeout"

Symptom: Validation of container app creation/update timed out

Cause: RBAC module not deployed first or dependsOn missing

Fix:

# Redeploy Layer 4 (ensures correct order)
./infrastructure/scripts/deploy-layer4.sh dev

Issue: System-Assigned Identity Still Used

Symptom: identity.type = "SystemAssigned"

Fix:

# Delete and recreate (cleanest)
az containerapp delete --name ldfdev-api-dev -g ldfdev-rg --yes
./infrastructure/scripts/deploy-layer4.sh dev --api

# Verify
az containerapp show -n ldfdev-api-dev -g ldfdev-rg --query "identity.type" -o tsv
# Should show: UserAssigned

Migration from Old RBAC

If You Have Existing Deployments

Option A: Clean Slate (Recommended)

# Delete all container apps
for app in ldfdev-api-dev ldfdev-mcp-verification ldfdev-mcp-documents ldfdev-mcp-financial; do
  az containerapp delete --name $app -g ldfdev-rg --yes
done

# Redeploy with new architecture
./infrastructure/scripts/deploy-layer4.sh dev

Option B: In-Place Update

# Redeploy (Bicep will update identity)
./infrastructure/scripts/deploy-layer4.sh dev

# Clean up old role assignments if any
az role assignment list --scope <acr-id> -o table
az role assignment delete --ids <old-assignment-id>

Best Practices

✅ Do

Use user-assigned identity for shared permissions
Deploy RBAC before resources that need it
Use resource-level scope (not subscription)
Verify RBAC after deployment
Document identity purpose in tags

❌ Don't

Don't use system-assigned for multiple resources
Don't assign at subscription level unnecessarily
Don't skip dependsOn in Bicep
Don't use service principals (use managed identities)
Don't hardcode subscription IDs

Security Notes

Least Privilege

ACR Pull only allows image pull (not push/delete)
Scoped to specific ACR resource
Identity purpose-limited to container apps

Audit

# View all role assignments
az role assignment list --assignee <principal-id> -o table

# View assignment history
az monitor activity-log list \
  --resource-group ldfdev-rg \
  --offset 7d \
  --query "[?contains(operationName.value, 'roleAssignments')]" \
  -o table

Regular Reviews

Review quarterly
Remove unused identities
Validate scope appropriateness
Check for over-privileged assignments

References

Documentation

ADR-047: Layer-Specific RBAC - Decision rationale
Security Architecture - Complete security design
Azure Container Apps Identity - Microsoft docs
RBAC Best Practices - Azure guidance

infrastructure/bicep/modules/rbac-layer1-foundation.bicep - Foundation RBAC
infrastructure/bicep/modules/rbac-layer4-container-apps.bicep - Container Apps RBAC
infrastructure/scripts/deploy-layer4.sh - Deployment script

Summary: Complete RBAC Role Matrix (Native AI Services Projects)

Managed Identity Roles (8 total)

Role	Target Resource	Purpose
Cognitive Services OpenAI User	AI Services Account	Model inference access
Cognitive Services User	AI Services Account	General service access
Azure AI User ⭐	AI Services Account	Data plane operations
Cognitive Services OpenAI Contributor	AI Services Project	Project/model management
Azure AI User ⭐	AI Services Project	Project data plane
Storage Blob Data Contributor	Storage Account	Read/write artifacts
AcrPull	Azure Container Registry	Pull container images

System Identity Roles (2 additional)

System Identity	Role	Target	Purpose
AI Services Account	Storage Blob Data Contributor	Storage Account	Service artifacts
AI Services Project	Storage Blob Data Contributor	Storage Account	Project artifacts

Total RBAC Assignments: 10 (8 for Managed Identity + 2 for System Identities)

🔄 Architecture Evolution

Before (ldfdev1-13): ML Services Hub/Project - Azure AI Administrator + Azure AI User on Hub - Azure AI Administrator + Azure AI User on Project - Total: 10 RBAC assignments

After (ldfdev14+): Native AI Services Projects - Cognitive Services roles on AI Services Account - Azure AI User on both Account + Project - Total: 10 RBAC assignments (simplified, more reliable)

Last Updated: 2025-10-30 Status: Production Ready ✅ Architecture: Native AI Services Projects (Option A) Owner: Infrastructure Team

RBAC Setup Guide

Overview

Quick Start

Architecture Overview

Layer-Specific RBAC

RBAC Modules

Module 1: Layer 1 RBAC (Fully Integrated)

Module 2: rbac-layer4-container-apps.bicep

Deployment Flow

Azure Built-in Roles

Roles Used

⚠️ Critical Role Distinction: Contributor vs. User

🔄 Architecture Change Note (2025-10-30)

Verification

Check Identity Created

Check ALL Role Assignments (Complete)

Verify Native AI Services Project Roles (Critical for Agent Operations)

Verify Service Endpoints on ACI Subnet

Check Container App Identity

Verify Image Pull Success

Troubleshooting

Issue: "Forbidden" or "Access Denied" from Azure AI Foundry

Issue: "Context Manager Not Available" Error

Issue: "Unauthorized" Error (ACR Image Pull)

Issue: "Validation Timeout"

Issue: System-Assigned Identity Still Used

Migration from Old RBAC

If You Have Existing Deployments

Best Practices

✅ Do

❌ Don't

Security Notes

Least Privilege

Audit

Regular Reviews

References

Documentation

Related Files

Summary: Complete RBAC Role Matrix (Native AI Services Projects)

Managed Identity Roles (8 total)

System Identity Roles (2 additional)

🔄 Architecture Evolution

Module 2: `rbac-layer4-container-apps.bicep`