Skip to content

AI Model Deployment Guide

Overview

This guide explains how to deploy AI models to Azure AI Services using the fully automated, repeatable deployment workflow.

Key Principle: Single Source of Truth

All configuration is in the parameter file. The parameter file contains everything needed for deployment: - Resource group name (in parameters.resourceGroupName.value) - AI Services account name - Model deployments (name, version, SKU, capacity)

No hardcoded values in scripts or workflows. This makes deployments fully repeatable and environment-agnostic.


Parameter File Structure

Location

infrastructure/bicep/environments/{environment}-models.parameters.json

Template

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "resourceGroupName": {
      "value": "ldfdev-rg"  // ← Resource group name (REQUIRED)
    },
    "aiServicesName": {
      "value": "ldfdev-ai"  // ← AI Services account name
    },
    "modelDeployments": {
      "value": [
        {
          "name": "gpt-4o",  // ← Deployment name (what your app uses)
          "model": {
            "format": "OpenAI",
            "name": "gpt-4o",  // ← Model name
            "version": "2024-08-06"  // ← Model version
          },
          "sku": {
            "name": "GlobalStandard",  // ← SKU type
            "capacity": 10  // ← Tokens per minute (in thousands)
          }
        }
      ]
    }
  }
}

Field Descriptions

Field Required Description
parameters.resourceGroupName.value ✅ Yes Azure resource group name where AI Services exists
parameters.aiServicesName.value ✅ Yes Name of the Azure AI Services account
parameters.modelDeployments.value ✅ Yes Array of model deployments to create
modelDeployments[].name ✅ Yes Deployment name (used by your app in API calls)
modelDeployments[].model.format ✅ Yes Usually "OpenAI"
modelDeployments[].model.name ✅ Yes Model name (e.g., gpt-4o, gpt-4-turbo)
modelDeployments[].model.version ✅ Yes Model version (e.g., 2024-08-06)
modelDeployments[].sku.name ✅ Yes SKU: GlobalStandard, Standard, GlobalProvisioned
modelDeployments[].sku.capacity ✅ Yes Capacity in thousands of TPM (tokens per minute)

Deployment Methods

  1. Push your changes (if parameter file was modified)

    git push origin your-branch
    

  2. Go to GitHub Actions

  3. Navigate to repository → Actions tab
  4. Select "Deploy AI Models" workflow

  5. Run workflow

  6. Click "Run workflow"
  7. Select environment: dev, staging, or prod
  8. Leave "models" field empty (uses parameter file)
  9. Click "Run workflow"

  10. Monitor deployment

  11. Watch workflow logs
  12. Review step summary for endpoint information

Method 2: Local Script

From project root:

./infrastructure/scripts/deploy-models.sh dev

Or from infrastructure directory:

cd infrastructure
./scripts/deploy-models.sh dev

Note: Script automatically reads resource group from parameter file!


After Deployment

Retrieve Endpoint Information

The deployment outputs provide everything your app needs:

  1. AI Services Endpoint:

    https://ldfdev-ai.cognitiveservices.azure.com/
    

  2. Deployment Name (from parameter file):

    gpt-4o
    

  3. Full Chat Completions Endpoint:

    POST https://ldfdev-ai.cognitiveservices.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview
    

Configure Your Application

Set these environment variables in your Container Apps:

AZURE_OPENAI_ENDPOINT=https://ldfdev-ai.cognitiveservices.azure.com/
AZURE_OPENAI_DEPLOYMENT_NAME=gpt-4o
AZURE_OPENAI_API_VERSION=2024-12-01-preview

Authentication

Option 1: Managed Identity (Recommended)

from azure.identity import DefaultAzureCredential
from openai import AzureOpenAI

client = AzureOpenAI(
    api_version="2024-12-01-preview",
    azure_endpoint="https://ldfdev-ai.cognitiveservices.azure.com/",
    azure_deployment="gpt-4o",
    azure_ad_token_provider=DefaultAzureCredential()
)

Option 2: API Key

# Get API key
az cognitiveservices account keys list \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --query "key1" -o tsv


Updating Models

Adding a New Model

Edit the parameter file and add to modelDeployments array:

{
  "name": "gpt-4-turbo",
  "model": {
    "format": "OpenAI",
    "name": "gpt-4-turbo",
    "version": "2024-04-09"
  },
  "sku": {
    "name": "GlobalStandard",
    "capacity": 15
  }
}

Then redeploy using GitHub Actions or local script.

Updating Capacity

Change the capacity value in the parameter file and redeploy. The deployment is idempotent - it will update the existing deployment.

Removing a Model

Remove the model object from the modelDeployments array. Note: This does NOT delete the deployment. To delete:

az cognitiveservices account deployment delete \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --deployment-name model-name

Available Models

Check available models in your region:

az cognitiveservices account list-models \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --query "[].{Name:name, Version:version, SKUs:skus[].name}" \
  -o table

Common Models

Model Name Latest Version Use Case
GPT-4o gpt-4o 2024-08-06 Latest, most capable
GPT-4 Turbo gpt-4-turbo 2024-04-09 Fast, cost-effective
GPT-4 gpt-4 various Previous generation
GPT-3.5 Turbo gpt-35-turbo various Fast, economical

Verification

1. Check in Azure Portal

  1. Navigate to: Azure Portal → AI Services → ldfdev-ai
  2. Go to: Model deployments blade
  3. Verify: Model appears with "Succeeded" status

2. Test with Azure CLI

# Get endpoint
az cognitiveservices account show \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --query "properties.endpoint" -o tsv

# List deployments
az cognitiveservices account deployment list \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --query "[].{Name:name, Model:properties.model.name, Status:properties.provisioningState}" \
  -o table

3. Test with curl

# Get API key
API_KEY=$(az cognitiveservices account keys list \
  --resource-group ldfdev-rg \
  --name ldfdev-ai \
  --query "key1" -o tsv)

# Test chat completions
curl https://ldfdev-ai.cognitiveservices.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-12-01-preview \
  -H "Content-Type: application/json" \
  -H "api-key: $API_KEY" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Hello!"}
    ],
    "max_tokens": 50
  }'

4. Test in Azure AI Foundry

  1. Visit: https://ai.azure.com
  2. Navigate to: Your project (e.g., ldfdev-hub-project)
  3. Go to: PlaygroundChat
  4. Select your deployment
  5. Test with sample prompts

Troubleshooting

Error: "Resource group not found"

Cause: Parameter file has wrong metadata.resourceGroup value

Fix: Update parameter file with correct resource group name

Error: "AI Services account not found"

Cause: Parameter file has wrong parameters.aiServicesName.value

Fix: Update parameter file with correct AI Services account name

Error: "Model not available"

Cause: Model/version not available in your region

Fix: Check available models with:

az cognitiveservices account list-models --resource-group ldfdev-rg --name ldfdev-ai

Error: "Quota exceeded"

Cause: Requested capacity exceeds your subscription quota

Fix: 1. Request quota increase in Azure Portal 2. Or reduce capacity in parameter file


Best Practices

1. Use Separate Parameter Files per Environment

  • dev-models.parameters.json
  • staging-models.parameters.json
  • prod-models.parameters.json

2. Start with Lower Capacity in Dev

// Dev: Lower capacity for testing
"capacity": 10  // 10K TPM

// Prod: Higher capacity for production load
"capacity": 50  // 50K TPM

3. Use Latest Stable Model Versions

Check Azure documentation for latest stable versions, not preview versions.

4. Test Deployments in Dev First

Always test new models/versions in dev environment before promoting to production.

5. Document Model Changes

Add a comment in parameter file explaining why model was changed:

// 2024-10-09: Upgraded to gpt-4o for better performance
"name": "gpt-4o"


Cost Optimization

Capacity Planning

Environment Recommended Capacity Cost (approximate)
Dev 10K TPM ~$20-50/month
Staging 20K TPM ~$40-100/month
Prod 50K+ TPM ~$100-500/month

Cost Monitoring

Monitor token usage in: 1. Application Insights: Token metrics 2. Azure Cost Management: Daily costs 3. AI Foundry Portal: Usage dashboard


Reference