AI Model Deployment Guide
Overview
This guide explains how to deploy AI models to Azure AI Services using the fully automated, repeatable deployment workflow.
Key Principle: Single Source of Truth
All configuration is in the parameter file. The parameter file contains everything needed for deployment:
- Resource group name (in parameters.resourceGroupName.value)
- AI Services account name
- Model deployments (name, version, SKU, capacity)
No hardcoded values in scripts or workflows. This makes deployments fully repeatable and environment-agnostic.
Parameter File Structure
Location
Template
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"resourceGroupName": {
"value": "ldfdev-rg" // ← Resource group name (REQUIRED)
},
"aiServicesName": {
"value": "ldfdev-ai" // ← AI Services account name
},
"modelDeployments": {
"value": [
{
"name": "gpt-4o", // ← Deployment name (what your app uses)
"model": {
"format": "OpenAI",
"name": "gpt-4o", // ← Model name
"version": "2024-08-06" // ← Model version
},
"sku": {
"name": "GlobalStandard", // ← SKU type
"capacity": 10 // ← Tokens per minute (in thousands)
}
}
]
}
}
}
Field Descriptions
| Field | Required | Description |
|---|---|---|
parameters.resourceGroupName.value |
✅ Yes | Azure resource group name where AI Services exists |
parameters.aiServicesName.value |
✅ Yes | Name of the Azure AI Services account |
parameters.modelDeployments.value |
✅ Yes | Array of model deployments to create |
modelDeployments[].name |
✅ Yes | Deployment name (used by your app in API calls) |
modelDeployments[].model.format |
✅ Yes | Usually "OpenAI" |
modelDeployments[].model.name |
✅ Yes | Model name (e.g., gpt-4o, gpt-4-turbo) |
modelDeployments[].model.version |
✅ Yes | Model version (e.g., 2024-08-06) |
modelDeployments[].sku.name |
✅ Yes | SKU: GlobalStandard, Standard, GlobalProvisioned |
modelDeployments[].sku.capacity |
✅ Yes | Capacity in thousands of TPM (tokens per minute) |
Deployment Methods
Method 1: GitHub Actions (Recommended)
-
Push your changes (if parameter file was modified)
-
Go to GitHub Actions
- Navigate to repository → Actions tab
-
Select "Deploy AI Models" workflow
-
Run workflow
- Click "Run workflow"
- Select environment:
dev,staging, orprod - Leave "models" field empty (uses parameter file)
-
Click "Run workflow"
-
Monitor deployment
- Watch workflow logs
- Review step summary for endpoint information
Method 2: Local Script
From project root:
Or from infrastructure directory:
Note: Script automatically reads resource group from parameter file!
After Deployment
Retrieve Endpoint Information
The deployment outputs provide everything your app needs:
-
AI Services Endpoint:
-
Deployment Name (from parameter file):
-
Full Chat Completions Endpoint:
Configure Your Application
Set these environment variables in your Container Apps:
AZURE_OPENAI_ENDPOINT=https://ldfdev-ai.cognitiveservices.azure.com/
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o
AZURE_OPENAI_API_VERSION=2024-12-01-preview
Authentication
Option 1: Managed Identity (Recommended)
from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI
client = AzureOpenAI(
api_version="2024-12-01-preview",
azure_endpoint="https://ldfdev-ai.cognitiveservices.azure.com/",
azure_deployment="gpt-4o",
azure_ad_token_provider=DefaultAzureCredential()
)
Option 2: API Key
# Get API key
az cognitiveservices account keys list \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--query "key1" -o tsv
Updating Models
Adding a New Model
Edit the parameter file and add to modelDeployments array:
{
"name": "gpt-4-turbo",
"model": {
"format": "OpenAI",
"name": "gpt-4-turbo",
"version": "2024-04-09"
},
"sku": {
"name": "GlobalStandard",
"capacity": 15
}
}
Then redeploy using GitHub Actions or local script.
Updating Capacity
Change the capacity value in the parameter file and redeploy. The deployment is idempotent - it will update the existing deployment.
Removing a Model
Remove the model object from the modelDeployments array. Note: This does NOT delete the deployment. To delete:
az cognitiveservices account deployment delete \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--deployment-name model-name
Available Models
Check available models in your region:
az cognitiveservices account list-models \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--query "[].{Name:name, Version:version, SKUs:skus[].name}" \
-o table
Common Models
| Model | Name | Latest Version | Use Case |
|---|---|---|---|
| GPT-4o | gpt-4o | 2024-08-06 | Latest, most capable |
| GPT-4 Turbo | gpt-4-turbo | 2024-04-09 | Fast, cost-effective |
| GPT-4 | gpt-4 | various | Previous generation |
| GPT-3.5 Turbo | gpt-35-turbo | various | Fast, economical |
Verification
1. Check in Azure Portal
- Navigate to: Azure Portal → AI Services →
ldfdev-ai - Go to: Model deployments blade
- Verify: Model appears with "Succeeded" status
2. Test with Azure CLI
# Get endpoint
az cognitiveservices account show \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--query "properties.endpoint" -o tsv
# List deployments
az cognitiveservices account deployment list \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--query "[].{Name:name, Model:properties.model.name, Status:properties.provisioningState}" \
-o table
3. Test with curl
# Get API key
API_KEY=$(az cognitiveservices account keys list \
--resource-group ldfdev-rg \
--name ldfdev-ai \
--query "key1" -o tsv)
# Test chat completions
curl https://ldfdev-ai.cognitiveservices.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview \
-H "Content-Type: application/json" \
-H "api-key: $API_KEY" \
-d '{
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hello!"}
],
"max_tokens": 50
}'
4. Test in Azure AI Foundry
- Visit: https://ai.azure.com
- Navigate to: Your project (e.g., ldfdev-hub-project)
- Go to: Playground → Chat
- Select your deployment
- Test with sample prompts
Troubleshooting
Error: "Resource group not found"
Cause: Parameter file has wrong metadata.resourceGroup value
Fix: Update parameter file with correct resource group name
Error: "AI Services account not found"
Cause: Parameter file has wrong parameters.aiServicesName.value
Fix: Update parameter file with correct AI Services account name
Error: "Model not available"
Cause: Model/version not available in your region
Fix: Check available models with:
Error: "Quota exceeded"
Cause: Requested capacity exceeds your subscription quota
Fix: 1. Request quota increase in Azure Portal 2. Or reduce capacity in parameter file
Best Practices
1. Use Separate Parameter Files per Environment
- ✅
dev-models.parameters.json - ✅
staging-models.parameters.json - ✅
prod-models.parameters.json
2. Start with Lower Capacity in Dev
// Dev: Lower capacity for testing
"capacity": 10 // 10K TPM
// Prod: Higher capacity for production load
"capacity": 50 // 50K TPM
3. Use Latest Stable Model Versions
Check Azure documentation for latest stable versions, not preview versions.
4. Test Deployments in Dev First
Always test new models/versions in dev environment before promoting to production.
5. Document Model Changes
Add a comment in parameter file explaining why model was changed:
Cost Optimization
Capacity Planning
| Environment | Recommended Capacity | Cost (approximate) |
|---|---|---|
| Dev | 10K TPM | ~$20-50/month |
| Staging | 20K TPM | ~$40-100/month |
| Prod | 50K+ TPM | ~$100-500/month |
Cost Monitoring
Monitor token usage in: 1. Application Insights: Token metrics 2. Azure Cost Management: Daily costs 3. AI Foundry Portal: Usage dashboard