ADR-025: Point-to-Site VPN Gateway for Development Environment Access
Status
✅ Accepted - 2025-01-06
Context
After deploying Azure infrastructure with Zero Trust security (private endpoints only), developers need secure access to private Azure resources like AI Foundry Studio, Key Vault, and Storage Accounts. The "Error loading Azure AI hub - unauthorized network location" issue highlighted the need for a secure remote access solution.
Problem Statement
With private endpoint-only deployments: - AI Foundry Studio (https://ai.azure.com) is inaccessible from public internet - Azure Portal cannot access private resources - Developers cannot test or configure AI models - No way to manage Key Vault, Storage, or other private services
Access Options Evaluated
| Option | Pros | Cons | Cost/Month | Complexity |
|---|---|---|---|---|
| Temporarily enable public access | Free, instant | Weakens security, manual toggle | $0 | Low |
| IP allowlist | Simple, targeted | Dynamic IPs, limited scalability | $0 | Low |
| Point-to-Site VPN | Full network access, secure | Expensive, 45-min deploy time | ~$144 | Medium |
| Site-to-Site VPN | Production-ready | Requires on-prem VPN device | ~$140-360 | High |
| Azure Bastion + Jump Box | No client needed, browser-based | Extra VM costs, less convenient for dev | ~$155 | Medium |
| ExpressRoute | Enterprise-grade, dedicated link | Extremely expensive, overkill for dev | ~$1,500+ | Very High |
Decision
Implement Point-to-Site (P2S) VPN Gateway with Azure AD authentication as an OPTIONAL module for development and staging environments only.
Key Decisions:
- Environment Restrictions:
- ✅ Allowed:
dev,staging - ❌ Blocked:
prod(use Azure Bastion instead) -
Enforced via Bicep validation and deployment script checks
-
Default Configuration:
dev.parameters.json:deployVpnGateway: true(enabled by default)prod.parameters.json: VPN parameters removed entirely-
Optional deployment via
--stage vpnflag -
Authentication Method:
- Azure AD authentication (no certificate management)
- Supports OpenVPN and IKEv2 protocols
-
MFA enabled through Azure AD
-
Network Configuration:
- VPN Gateway SKU:
VpnGw1(basic tier, sufficient for dev) - VPN client address pool:
172.16.0.0/24(non-overlapping with VNet) -
Gateway Subnet:
10.0.4.0/27(dedicated subnet required) -
Deployment Strategy:
- New stage:
vpn(can be deployed separately) - Depends on:
foundationstage (requires VNet) - Deploy time: 30-45 minutes (VPN Gateway provisioning)
Rationale
Why Point-to-Site VPN?
- Full Network Access: Developers can access ALL private resources as if on the Azure VNet
- Secure: Encrypted tunnel, Azure AD authentication, no public endpoints exposed
- Developer Experience: Works with local tools (VS Code, Azure CLI, SDKs)
- Multi-User: Supports up to 250 concurrent connections (team growth)
- Industry Standard: Common pattern for dev/staging environments
Why NOT for Production?
- Cost: $144/month 24/7 even when not connected (wasteful for prod)
- Attack Surface: VPN endpoint is internet-facing (minimal but exists)
- Compliance: Bastion provides better audit logs and compliance
- Just-In-Time Access: Bastion supports JIT VM access, VPN doesn't
Why Azure AD Authentication?
- No Certificate Management: No need to generate, distribute, or rotate certificates
- MFA Support: Leverage existing Azure AD MFA policies
- User Management: Automatic access control through Azure RBAC
- Audit: Azure AD logs all authentication attempts
Consequences
Positive
- ✅ Immediate Access: Developers can access AI Foundry Studio after 45-minute deployment
- ✅ Secure: Zero Trust architecture maintained (no public endpoints)
- ✅ Scalable: Easy to add more developers (just grant Azure AD access)
- ✅ Flexible: Can teardown VPN when not needed to save costs
- ✅ Optional: Teams can choose to skip VPN and use alternatives
- ✅ Well-Documented: Comprehensive setup guide for developers
Negative
- ⚠️ Cost: ~$144/month per environment (only deploy in dev)
- ⚠️ Deployment Time: 30-45 minutes to provision VPN Gateway
- ⚠️ Client Setup: Developers must install Azure VPN Client
- ⚠️ Maintenance: VPN Gateway requires periodic updates
Mitigations
- Cost: Only deploy in dev, teardown when not actively developing
- Deployment Time: Deploy VPN as separate stage (async from other resources)
- Client Setup: Provide step-by-step setup guide (
docs/deployment/infra/vpn-setup-guide.md) - Maintenance: Use Azure-managed VPN Gateway (automatic updates)
Implementation
Architecture
Developer Laptop
↓
Azure VPN Client (OpenVPN/IKEv2)
↓
Azure VPN Gateway (VpnGw1)
↓
VNet (10.0.0.0/16)
├─ Container Apps Subnet (10.0.0.0/23)
├─ APIM Subnet (10.0.2.0/24)
├─ Private Endpoints Subnet (10.0.3.0/24)
└─ Gateway Subnet (10.0.4.0/27) ← New
↓
Private Endpoints
├─ AI Services
├─ Key Vault
├─ Storage Account
└─ AI Search
Files Created/Modified
New Files:
- infrastructure/bicep/modules/vpn-gateway.bicep - VPN Gateway module
- docs/deployment/infra/vpn-setup-guide.md - Developer setup guide
Modified Files:
- infrastructure/bicep/main-avm.bicep - Added VPN stage and validation
- infrastructure/bicep/modules/networking.bicep - Added conditional GatewaySubnet
- infrastructure/bicep/environments/dev.parameters.json - Enabled VPN
- infrastructure/bicep/environments/prod.parameters.json - Removed VPN params
- infrastructure/scripts/deploy.sh - Added VPN stage support
Deployment Commands
# Deploy infrastructure with VPN (dev only)
./infrastructure/scripts/deploy.sh dev loan-defenders-dev-rg --stage all
# Deploy VPN separately (after infrastructure exists)
./infrastructure/scripts/deploy.sh dev loan-defenders-dev-rg --stage vpn
# Teardown VPN when not needed (saves ~$144/month)
# (No separate teardown for VPN - use full environment teardown)
./infrastructure/scripts/teardown.sh dev loan-defenders-dev-rg --confirm
Security Considerations
- Azure AD Authentication: Only authenticated users can connect
- MFA Enforcement: Leverage Azure AD Conditional Access policies
- IP Rotation: VPN client IPs rotate from address pool
- Encrypted Tunnel: TLS encryption for all traffic
- No Direct Internet: VPN clients cannot be routing endpoints
Cost Management
Monthly Cost Breakdown (VpnGw1): - VPN Gateway: ~$140/month - Public IP (Static): ~$3.60/month - Total: ~$144/month
Cost Optimization Strategies: 1. Only deploy in dev environment 2. Teardown when not actively developing (rebuild in 45 min when needed) 3. Use scheduled teardown/rebuild (e.g., weekdays only) 4. Consider smaller VPN Gateway SKU if fewer users
Alternatives Considered
Alternative 1: Temporary Public Access
Decision: Rejected for production, acceptable for quick dev testing
- Pros: Free, instant
- Cons: Security risk, manual toggle required
- Verdict: Good for 5-minute testing, not sustainable for development
Alternative 2: Azure Bastion + Jump Box
Decision: Recommended for production, overkill for dev
- Pros: No client software, better audit logs
- Cons: Extra VM costs, browser-based only (can't use local tools)
- Verdict: Best for production, less convenient for active development
Alternative 3: IP Allowlist
Decision: Rejected (not scalable)
- Pros: Simple, targeted
- Cons: Dynamic home IPs, doesn't scale to teams
- Verdict: Not viable for multi-developer teams
Production Recommendation
For production environments, use Azure Bastion instead of VPN Gateway:
# Deploy Bastion (future implementation)
./infrastructure/scripts/deploy.sh prod loan-defenders-prod-rg --stage bastion
Bastion Advantages for Production: - Browser-based (no client software) - Just-In-Time (JIT) VM access - Superior audit logging - Compliance-friendly - No internet-facing VPN endpoint
Validation
Acceptance Criteria
- VPN Gateway deploys successfully in dev environment
- VPN Gateway deployment fails in prod environment (validation enforced)
- Developers can connect using Azure VPN Client
- Developers can access AI Foundry Studio after connecting
- VPN can be deployed as separate stage (
--stage vpn) - Comprehensive setup documentation provided
- Cost optimization strategies documented
Testing Plan
- Deploy dev infrastructure with VPN enabled
- Wait for VPN Gateway to provision (30-45 minutes)
- Download VPN client configuration
- Install Azure VPN Client
- Connect to VPN
- Access AI Foundry Studio at https://ai.azure.com
- Verify access to other private resources (Key Vault, Storage)
- Test teardown and rebuild process
- Attempt to deploy VPN in prod (should fail with validation error)
References
- Azure VPN Gateway Documentation
- Point-to-Site VPN Overview
- Azure AD Authentication for P2S
- VPN Gateway Pricing
- Azure Bastion Documentation
Related ADRs
- ADR-014: Zero Trust Security Architecture - Private endpoints requirement
- ADR-016: Network Security - VNet architecture
- ADR-024: AI Foundry Integration - Private AI Services access
Decision Makers
- Author: AI-augmented development (Claude Code)
- Stakeholder: Solo developer / Small teams
- Date: 2025-01-06
Notes
- VPN Gateway is the fastest path to productive development with Zero Trust
- For cost-sensitive projects, consider using temporary public access during development
- For compliance-critical projects, deploy VPN even in dev for consistent security posture
- VPN Gateway can be torn down and rebuilt in < 1 hour when needed
Outstanding Issues
This section tracks remaining improvements identified during code review but not yet implemented:
Fixed in PR #106 Review Updates (2025-01-06)
The following critical and major issues were addressed:
- ✅ CRI-01: Added NSG rules for Gateway Subnet (UDP 500, 4500, 1194, GatewayManager service tag)
- ✅ MAJ-02: Added error handling and version pinning for PowerShell module installation
- ✅ MAJ-03: Added VPN Gateway dependency validation in deployment script
- ✅ MAJ-04: Added VPN connection check to teardown workflow to prevent disruption
Remaining Improvements (Future PRs)
Medium Priority:
- MAJ-01: Gateway Subnet sizing insufficient
- Current: /27 (30 usable IPs)
- Recommended: /26 (62 usable IPs)
- Reason: Future Active-Active VPN Gateway and ExpressRoute coexistence
- Tracking: Create GitHub issue for subnet resize
- Impact: Low (current size sufficient for dev, resize needed before production-level features)
Low Priority:
- MIN-01: Hardcoded Tenant ID in vpn-gateway.bicep:52
- Current: aadTenantId string = tenant().tenantId
- Status: Already using tenant().tenantId, no fix needed
- MIN-02: Missing VPN Client Config Output
- Add Bicep output with VPN client package download command
-
Makes developer setup easier
-
MIN-03: Incomplete Cost Breakdown
- Add detailed monthly/annual cost projections
-
Include data transfer costs
-
MIN-04: No IP Overlap Validation
- Document reserved IP ranges in network architecture
-
Add validation for VPN client pool vs VNet overlap
-
MIN-05: No Deployment Progress Feedback
- Add progress monitoring for 30-45 min VPN Gateway provisioning
-
Consider Azure Monitor alerts for deployment completion
-
MIN-06: Missing Health Check Endpoint
- Add Azure Monitor availability alert for VPN Gateway
- Automatic notification if gateway goes down
Not Implemented (By Design)
- CRI-02: Audit logging for VPN connections
- Status: Not implemented in this PR
- Reason: Azure VPN Gateway P2S connections are already logged in Azure AD sign-in logs
- Location: Azure Portal → Azure Active Directory → Sign-ins
- Data Captured: User identity, connection time, source IP, MFA status
- Future Enhancement: Consider adding VPN Gateway diagnostic settings for additional telemetry (IKEDiagnosticLog, P2SDiagnosticLog)
- Tracking: Create GitHub issue for diagnostic settings implementation
Implementation Tracking
To track these improvements:
1. Create GitHub issues for each outstanding item (MAJ-01, MIN-02 through MIN-06, CRI-02)
2. Label with enhancement, infrastructure, vpn-gateway
3. Assign to appropriate milestone (Phase 2 or later)
4. Update this ADR when issues are resolved
Last Updated: 2025-01-06 (PR #106 review fixes)