Skip to content

ADR-025: Point-to-Site VPN Gateway for Development Environment Access

Status

Accepted - 2025-01-06

Context

After deploying Azure infrastructure with Zero Trust security (private endpoints only), developers need secure access to private Azure resources like AI Foundry Studio, Key Vault, and Storage Accounts. The "Error loading Azure AI hub - unauthorized network location" issue highlighted the need for a secure remote access solution.

Problem Statement

With private endpoint-only deployments: - AI Foundry Studio (https://ai.azure.com) is inaccessible from public internet - Azure Portal cannot access private resources - Developers cannot test or configure AI models - No way to manage Key Vault, Storage, or other private services

Access Options Evaluated

Option Pros Cons Cost/Month Complexity
Temporarily enable public access Free, instant Weakens security, manual toggle $0 Low
IP allowlist Simple, targeted Dynamic IPs, limited scalability $0 Low
Point-to-Site VPN Full network access, secure Expensive, 45-min deploy time ~$144 Medium
Site-to-Site VPN Production-ready Requires on-prem VPN device ~$140-360 High
Azure Bastion + Jump Box No client needed, browser-based Extra VM costs, less convenient for dev ~$155 Medium
ExpressRoute Enterprise-grade, dedicated link Extremely expensive, overkill for dev ~$1,500+ Very High

Decision

Implement Point-to-Site (P2S) VPN Gateway with Azure AD authentication as an OPTIONAL module for development and staging environments only.

Key Decisions:

  1. Environment Restrictions:
  2. Allowed: dev, staging
  3. Blocked: prod (use Azure Bastion instead)
  4. Enforced via Bicep validation and deployment script checks

  5. Default Configuration:

  6. dev.parameters.json: deployVpnGateway: true (enabled by default)
  7. prod.parameters.json: VPN parameters removed entirely
  8. Optional deployment via --stage vpn flag

  9. Authentication Method:

  10. Azure AD authentication (no certificate management)
  11. Supports OpenVPN and IKEv2 protocols
  12. MFA enabled through Azure AD

  13. Network Configuration:

  14. VPN Gateway SKU: VpnGw1 (basic tier, sufficient for dev)
  15. VPN client address pool: 172.16.0.0/24 (non-overlapping with VNet)
  16. Gateway Subnet: 10.0.4.0/27 (dedicated subnet required)

  17. Deployment Strategy:

  18. New stage: vpn (can be deployed separately)
  19. Depends on: foundation stage (requires VNet)
  20. Deploy time: 30-45 minutes (VPN Gateway provisioning)

Rationale

Why Point-to-Site VPN?

  1. Full Network Access: Developers can access ALL private resources as if on the Azure VNet
  2. Secure: Encrypted tunnel, Azure AD authentication, no public endpoints exposed
  3. Developer Experience: Works with local tools (VS Code, Azure CLI, SDKs)
  4. Multi-User: Supports up to 250 concurrent connections (team growth)
  5. Industry Standard: Common pattern for dev/staging environments

Why NOT for Production?

  1. Cost: $144/month 24/7 even when not connected (wasteful for prod)
  2. Attack Surface: VPN endpoint is internet-facing (minimal but exists)
  3. Compliance: Bastion provides better audit logs and compliance
  4. Just-In-Time Access: Bastion supports JIT VM access, VPN doesn't

Why Azure AD Authentication?

  • No Certificate Management: No need to generate, distribute, or rotate certificates
  • MFA Support: Leverage existing Azure AD MFA policies
  • User Management: Automatic access control through Azure RBAC
  • Audit: Azure AD logs all authentication attempts

Consequences

Positive

  1. Immediate Access: Developers can access AI Foundry Studio after 45-minute deployment
  2. Secure: Zero Trust architecture maintained (no public endpoints)
  3. Scalable: Easy to add more developers (just grant Azure AD access)
  4. Flexible: Can teardown VPN when not needed to save costs
  5. Optional: Teams can choose to skip VPN and use alternatives
  6. Well-Documented: Comprehensive setup guide for developers

Negative

  1. ⚠️ Cost: ~$144/month per environment (only deploy in dev)
  2. ⚠️ Deployment Time: 30-45 minutes to provision VPN Gateway
  3. ⚠️ Client Setup: Developers must install Azure VPN Client
  4. ⚠️ Maintenance: VPN Gateway requires periodic updates

Mitigations

  • Cost: Only deploy in dev, teardown when not actively developing
  • Deployment Time: Deploy VPN as separate stage (async from other resources)
  • Client Setup: Provide step-by-step setup guide (docs/deployment/infra/vpn-setup-guide.md)
  • Maintenance: Use Azure-managed VPN Gateway (automatic updates)

Implementation

Architecture

Developer Laptop
Azure VPN Client (OpenVPN/IKEv2)
Azure VPN Gateway (VpnGw1)
VNet (10.0.0.0/16)
    ├─ Container Apps Subnet (10.0.0.0/23)
    ├─ APIM Subnet (10.0.2.0/24)
    ├─ Private Endpoints Subnet (10.0.3.0/24)
    └─ Gateway Subnet (10.0.4.0/27) ← New
Private Endpoints
    ├─ AI Services
    ├─ Key Vault
    ├─ Storage Account
    └─ AI Search

Files Created/Modified

New Files: - infrastructure/bicep/modules/vpn-gateway.bicep - VPN Gateway module - docs/deployment/infra/vpn-setup-guide.md - Developer setup guide

Modified Files: - infrastructure/bicep/main-avm.bicep - Added VPN stage and validation - infrastructure/bicep/modules/networking.bicep - Added conditional GatewaySubnet - infrastructure/bicep/environments/dev.parameters.json - Enabled VPN - infrastructure/bicep/environments/prod.parameters.json - Removed VPN params - infrastructure/scripts/deploy.sh - Added VPN stage support

Deployment Commands

# Deploy infrastructure with VPN (dev only)
./infrastructure/scripts/deploy.sh dev loan-defenders-dev-rg --stage all

# Deploy VPN separately (after infrastructure exists)
./infrastructure/scripts/deploy.sh dev loan-defenders-dev-rg --stage vpn

# Teardown VPN when not needed (saves ~$144/month)
# (No separate teardown for VPN - use full environment teardown)
./infrastructure/scripts/teardown.sh dev loan-defenders-dev-rg --confirm

Security Considerations

  1. Azure AD Authentication: Only authenticated users can connect
  2. MFA Enforcement: Leverage Azure AD Conditional Access policies
  3. IP Rotation: VPN client IPs rotate from address pool
  4. Encrypted Tunnel: TLS encryption for all traffic
  5. No Direct Internet: VPN clients cannot be routing endpoints

Cost Management

Monthly Cost Breakdown (VpnGw1): - VPN Gateway: ~$140/month - Public IP (Static): ~$3.60/month - Total: ~$144/month

Cost Optimization Strategies: 1. Only deploy in dev environment 2. Teardown when not actively developing (rebuild in 45 min when needed) 3. Use scheduled teardown/rebuild (e.g., weekdays only) 4. Consider smaller VPN Gateway SKU if fewer users

Alternatives Considered

Alternative 1: Temporary Public Access

Decision: Rejected for production, acceptable for quick dev testing

  • Pros: Free, instant
  • Cons: Security risk, manual toggle required
  • Verdict: Good for 5-minute testing, not sustainable for development

Alternative 2: Azure Bastion + Jump Box

Decision: Recommended for production, overkill for dev

  • Pros: No client software, better audit logs
  • Cons: Extra VM costs, browser-based only (can't use local tools)
  • Verdict: Best for production, less convenient for active development

Alternative 3: IP Allowlist

Decision: Rejected (not scalable)

  • Pros: Simple, targeted
  • Cons: Dynamic home IPs, doesn't scale to teams
  • Verdict: Not viable for multi-developer teams

Production Recommendation

For production environments, use Azure Bastion instead of VPN Gateway:

# Deploy Bastion (future implementation)
./infrastructure/scripts/deploy.sh prod loan-defenders-prod-rg --stage bastion

Bastion Advantages for Production: - Browser-based (no client software) - Just-In-Time (JIT) VM access - Superior audit logging - Compliance-friendly - No internet-facing VPN endpoint

Validation

Acceptance Criteria

  • VPN Gateway deploys successfully in dev environment
  • VPN Gateway deployment fails in prod environment (validation enforced)
  • Developers can connect using Azure VPN Client
  • Developers can access AI Foundry Studio after connecting
  • VPN can be deployed as separate stage (--stage vpn)
  • Comprehensive setup documentation provided
  • Cost optimization strategies documented

Testing Plan

  1. Deploy dev infrastructure with VPN enabled
  2. Wait for VPN Gateway to provision (30-45 minutes)
  3. Download VPN client configuration
  4. Install Azure VPN Client
  5. Connect to VPN
  6. Access AI Foundry Studio at https://ai.azure.com
  7. Verify access to other private resources (Key Vault, Storage)
  8. Test teardown and rebuild process
  9. Attempt to deploy VPN in prod (should fail with validation error)

References

Decision Makers

  • Author: AI-augmented development (Claude Code)
  • Stakeholder: Solo developer / Small teams
  • Date: 2025-01-06

Notes

  • VPN Gateway is the fastest path to productive development with Zero Trust
  • For cost-sensitive projects, consider using temporary public access during development
  • For compliance-critical projects, deploy VPN even in dev for consistent security posture
  • VPN Gateway can be torn down and rebuilt in < 1 hour when needed

Outstanding Issues

This section tracks remaining improvements identified during code review but not yet implemented:

Fixed in PR #106 Review Updates (2025-01-06)

The following critical and major issues were addressed:

  • CRI-01: Added NSG rules for Gateway Subnet (UDP 500, 4500, 1194, GatewayManager service tag)
  • MAJ-02: Added error handling and version pinning for PowerShell module installation
  • MAJ-03: Added VPN Gateway dependency validation in deployment script
  • MAJ-04: Added VPN connection check to teardown workflow to prevent disruption

Remaining Improvements (Future PRs)

Medium Priority: - MAJ-01: Gateway Subnet sizing insufficient - Current: /27 (30 usable IPs) - Recommended: /26 (62 usable IPs) - Reason: Future Active-Active VPN Gateway and ExpressRoute coexistence - Tracking: Create GitHub issue for subnet resize - Impact: Low (current size sufficient for dev, resize needed before production-level features)

Low Priority: - MIN-01: Hardcoded Tenant ID in vpn-gateway.bicep:52 - Current: aadTenantId string = tenant().tenantId - Status: Already using tenant().tenantId, no fix needed

  • MIN-02: Missing VPN Client Config Output
  • Add Bicep output with VPN client package download command
  • Makes developer setup easier

  • MIN-03: Incomplete Cost Breakdown

  • Add detailed monthly/annual cost projections
  • Include data transfer costs

  • MIN-04: No IP Overlap Validation

  • Document reserved IP ranges in network architecture
  • Add validation for VPN client pool vs VNet overlap

  • MIN-05: No Deployment Progress Feedback

  • Add progress monitoring for 30-45 min VPN Gateway provisioning
  • Consider Azure Monitor alerts for deployment completion

  • MIN-06: Missing Health Check Endpoint

  • Add Azure Monitor availability alert for VPN Gateway
  • Automatic notification if gateway goes down

Not Implemented (By Design)

  • CRI-02: Audit logging for VPN connections
  • Status: Not implemented in this PR
  • Reason: Azure VPN Gateway P2S connections are already logged in Azure AD sign-in logs
  • Location: Azure Portal → Azure Active Directory → Sign-ins
  • Data Captured: User identity, connection time, source IP, MFA status
  • Future Enhancement: Consider adding VPN Gateway diagnostic settings for additional telemetry (IKEDiagnosticLog, P2SDiagnosticLog)
  • Tracking: Create GitHub issue for diagnostic settings implementation

Implementation Tracking

To track these improvements: 1. Create GitHub issues for each outstanding item (MAJ-01, MIN-02 through MIN-06, CRI-02) 2. Label with enhancement, infrastructure, vpn-gateway 3. Assign to appropriate milestone (Phase 2 or later) 4. Update this ADR when issues are resolved

Last Updated: 2025-01-06 (PR #106 review fixes)