Debugging Private Network Issues When Public Queries Are Blocked

⚡ TL;DR

When Azure Private Link blocks public queries to your Log Analytics workspace, Azure SRE Agent can still investigate issues—by delegating queries to an MCP server running inside your VNet. The MCP server queries Log Analytics from within the private network, and SRE Agent coordinates the investigation remotely.

🔐 The Private Network Security Challenge

Modern cloud architecture often requires strict network isolation:

🛡️ Security teams say: "No public access to Log Analytics. Private Link with query access blocked from outside the VNet."

🔧 SRE teams say: "How is our AI agent supposed to query logs if it can't reach Log Analytics?"

This creates a real problem. When you configure Azure Monitor Private Link Scope (AMPLS) with queryAccessMode: PrivateOnly and disable public query access on your Log Analytics workspace, all external queries are blocked—including those from Azure SRE Agent.

Try querying from outside the VNet and you'll see:

❌ InsufficientAccessError: The query was blocked due to private link configuration. 
   Access is denied because this request was not made through a private endpoint.

The question: How can an AI agent investigate issues in a workspace it can't query?

🌉 The Solution: MCP Server Inside the VNet

The answer is the Model Context Protocol (MCP). Instead of SRE Agent querying Log Analytics directly, we deploy an MCP server inside the VNet that can query Log Analytics on the agent's behalf.

Architecture diagram showing MCP server inside VNet querying Log Analytics

📐 Click to expand: Architecture Diagram (Text Version)

┌─────────────────────────────────────────────────────────────────────────────┐
│                   Private VNet (VNet-Integrated Environment)                │
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                     Azure Container Apps Environment                    ││
│  │                         (VNet-Integrated)                               ││
│  │                                                                          ││
│  │  ┌─────────────────────────────────────────────────────────────────┐    ││
│  │  │              Log Analytics MCP Server                           │    ││
│  │  │     (log-analytics-mcp-vnet.*.azurecontainerapps.io)           │    ││
│  │  │                                                                  │    ││
│  │  │  Tools:                                                          │    ││
│  │  │  ├── query_logs        - Execute any KQL query                  │    ││
│  │  │  ├── list_tables       - Discover available tables              │    ││
│  │  │  ├── get_workspace_info - Data volume and usage stats          │    ││
│  │  │  ├── analyze_errors    - Pre-built error pattern analysis      │    ││
│  │  │  └── check_vm_health   - Heartbeat, CPU, memory, error count   │    ││
│  │  │                                                                  │    ││
│  │  │  Authentication: Managed Identity (Log Analytics Reader)        │    ││
│  │  └───────────────────────────┬─────────────────────────────────────┘    ││
│  │                              │                                          ││
│  └──────────────────────────────┼──────────────────────────────────────────┘│
│                                 │ Internal query via Private Endpoint       │
│                                 ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────────────┐│
│  │                    Log Analytics Workspace                              ││
│  │                                                                          ││
│  │  Settings:                                                               ││
│  │  • publicNetworkAccessForQuery: Disabled                                ││
│  │  • AMPLS queryAccessMode: PrivateOnly                                   ││
│  │                                                                          ││
│  │  Tables: Syslog, Perf, Heartbeat, ContainerLog, etc.                   ││
│  └─────────────────────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────────────────────┘
                                 │
                                 │ Streamable HTTP (MCP Protocol)
                                 ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│                         Azure SRE Agent                                     │
│                    (Outside the VNet)                                       │
│                                                                             │
│  "Investigate the errors in my private Log Analytics workspace"            │
│                                                                             │
│  ✓ Calls MCP server tools over Streamable HTTP                             │
│  ✓ MCP server queries Log Analytics internally                              │
│  ✓ Results returned to agent for analysis                                   │
│  ✓ No direct VNet access required                                           │
└─────────────────────────────────────────────────────────────────────────────┘

💡 Key Insight

The MCP server acts as a trusted bridge:

Capability	Description
🏠 Runs inside the VNet	Can query the Private Link-protected workspace
🌐 Exposes HTTP endpoint	SRE Agent can call it from anywhere
🔑 Uses Managed Identity	No secrets needed for authentication
🗣️ Natural language to KQL	Transforms your questions into queries

🚨 The 3 AM Incident

Let's walk through a real scenario.

📱 The Alert

It's 3:17 AM. Your phone buzzes:

⚠️ CRITICAL: Payment Service - High Error Rate
📍 Affected: app-vm (10.0.1.10)
📊 Error rate: 47% (threshold: 5%)

Alert notification

In the old world, you'd be scrambling for VPN credentials. But you have Azure SRE Agent.

🔍 The Investigation

You open SRE Agent from your phone (or laptop—no VPN needed) and type:

"I got an alert about high error rates on app-vm in my private network. The VM has no public IP, but logs should be in Log Analytics. Can you investigate?"

Within seconds, SRE Agent responds:

SRE Agent investigating the issue

🔎 Click to expand: SRE Agent Investigation Output

I'll query Log Analytics for recent errors from app-vm. Let me check 
the Syslog and performance data...

Found 47 error entries in the last 15 minutes from app-vm. 
Here's what I'm seeing:

📊 Error Pattern Analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• 23 × "Connection to database timed out after 30000ms"
• 12 × "Failed to process transaction: Connection refused"  
• 8 × "Circuit breaker OPEN for database connection pool"
• 4 × "OutOfMemoryError in transaction handler thread"

🔍 Root Cause Indicators:
The errors suggest a cascade failure:
1. Database connectivity issues started first
2. Connection pool exhausted, circuit breaker opened
3. Memory pressure built up from queued requests
4. Application became unresponsive

Let me check the performance metrics and correlate with db-vm...

🔗 Cross-VM Correlation

Without any additional prompting, SRE Agent digs deeper:

Cross-VM correlation analysis

📈 Click to expand: Performance Analysis Output

📈 Performance Analysis (Last 30 minutes):

app-vm:
├── Memory: 94% used (up from 45% baseline) ⚠️
├── CPU: 78% (elevated)
└── First error: 03:02 AM

db-vm:
├── Disk I/O: 450 MB/s writes (unusual spike) ⚠️
├── MySQL errors: "Too many connections", "Lock wait timeout"
└── First error: 02:58 AM

web-vm:
├── nginx: "upstream timed out", "502 Bad Gateway"
└── First error: 03:05 AM

🔗 Correlation Timeline:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
02:58 - db-vm: High disk I/O, MySQL connection issues begin
03:02 - app-vm: Database timeouts start, memory climbing
03:05 - web-vm: Upstream timeouts, 502 errors to users
03:17 - Alert triggered (47% error rate)

💡 Root Cause:
The issue originated on db-vm with abnormal disk I/O causing 
MySQL connection failures. This cascaded to app-vm (connection 
timeouts, memory pressure) and finally to web-vm (502 errors).

Recommendation: Investigate what caused the disk I/O spike on 
db-vm. Check for runaway queries, backup jobs, or disk issues.

✅ The Resolution

Based on SRE Agent's analysis, you know exactly where to focus. The issue isn't the app-vm memory (a symptom)—it's the db-vm disk I/O (the root cause).

You can even ask SRE Agent for specific KQL queries to run:

"Give me a KQL query to find the exact MySQL statements that caused the lock wait timeouts"

Syslog
| where TimeGenerated > ago(1h)
| where Computer == "db-vm"
| where SyslogMessage contains "Lock wait" or SyslogMessage contains "Deadlock"
| project TimeGenerated, SyslogMessage
| order by TimeGenerated desc

SRE Agent providing KQL query

⏱️ Time from alert to root cause identification: Under 5 minutes.

🚫 No VPN. 🚫 No Bastion. 🚫 No SSH. ✅ Just answers.

🛡️ Why This Matters

Security Benefits

Concern	How This Architecture Addresses It
🔒 Data in transit	Private Link keeps all data on Azure backbone
🌐 Public exposure	VMs have zero public IPs
🔑 Access control	RBAC on Log Analytics, no direct VM access needed
📝 Audit trail	All SRE Agent queries logged in Azure Activity Log
✅ Compliance	Data never leaves your Azure region

Operational Benefits

🐢 Old Way	⚡ New Way
VPN + Bastion + SSH	Open browser, ask a question
grep across 10 VMs	Single query, all VMs at once
Requires network expertise	Natural language queries
30+ minutes to understand	Under 5 minutes to root cause

🔐 Security Deep Dive: MCP Bridge vs. Pure Private Link

For security-minded readers, let's be clear about the architecture, the tradeoffs, and how this pattern compares to pure Private Link solutions.

Understanding the Two Patterns

🔗 Pure Private Link (Multitenant SaaS Pattern)

The Azure Private Link in multitenant solutions pattern is designed for SaaS providers exposing services to tenants:

Aspect	How It Works
Direction	Tenant VNet → SaaS Provider (inbound to provider)
Purpose	Allow tenants to access a SaaS service privately from their VNet
Network Path	Tenant VNet → Private Endpoint → Provider's Load Balancer/PaaS
Traffic	Entirely on Azure backbone—no public internet
Use Case	Enterprise customers accessing SaaS from their private networks

┌─────────────────┐         ┌─────────────────┐         ┌─────────────────┐
│   Tenant A      │         │   Tenant B      │         │   Your SaaS     │
│     VNet        │         │     VNet        │         │   Service       │
│                 │         │                 │         │                 │
│ ┌─────────────┐ │         │ ┌─────────────┐ │         │                 │
│ │  Private    │─┼─────────┼─┤  Private    │─┼─────────┤►               │
│ │  Endpoint   │ │   Azure │ │  Endpoint   │ │  Azure  │                 │
│ └─────────────┘ │  Backbone│ └─────────────┘ │ Backbone│                 │
└─────────────────┘         └─────────────────┘         └─────────────────┘

✅ Entire path is private—no public internet exposure

🌉 MCP Bridge Pattern (Our Approach)

Our pattern is designed for a different problem: enabling an external AI agent to query resources that are (correctly!) protected by Private Link:

Aspect	How It Works
Direction	External AI Agent → MCP Server (in VNet) → Private Resource
Purpose	Allow external AI to query customer's own private resources
Network Path	Public HTTPS → MCP Server → Private Endpoint → Log Analytics
Traffic	Public on ingress, private on the data plane
Use Case	AI-assisted investigation of Private Link-protected resources

┌─────────────────────────────────────────────────────────────────────┐
│                        Customer's VNet                               │
│                                                                      │
│  ┌─────────────────┐  Private   ┌─────────────────┐                 │
│  │  MCP Server     │  Endpoint  │  Log Analytics  │                 │
│  │  (Container App)│───────────►│  (Private Only) │                 │
│  └────────▲────────┘            └─────────────────┘                 │
│           │                                                          │
└───────────┼──────────────────────────────────────────────────────────┘
            │ Public HTTPS
            │ (API Key + TLS)
    ┌───────┴───────┐
    │  SRE Agent    │
    │  (external)   │
    └───────────────┘

⚠️ Public ingress point exists, but Log Analytics remains fully private

Key Difference: The Public Ingress Point

Let's be explicit about the tradeoff:

Aspect	Pure Private Link	MCP Bridge Pattern
Ingress from client	Private (VNet/VPN/ExpressRoute)	Public HTTPS
Data plane access	Private	Private (via Private Endpoint)
Works with external AI	❌ No—client must be in VNet	✅ Yes—that's the purpose
Zero public exposure	✅ Yes	⚠️ MCP server has public endpoint

Why the public endpoint?
SRE Agent runs as a cloud service—it's not deployed in your VNet. To enable AI-assisted investigation, we need a bridge between the external agent and the private resources. The MCP server is that bridge.

What's Protected, What's Exposed

Component	Access Level	Why
Log Analytics Workspace	🔒 Private Only	`queryAccessMode: PrivateOnly` enforced—no public queries
MCP Server Ingress	🌐 Public HTTPS	Required for SRE Agent to reach it
MCP Server → Log Analytics	🔒 Private	Uses Private Endpoint inside VNet
Query Results	🔒 Encrypted	Returned over TLS, processed by MCP server

💡 Key insight: Log Analytics is never exposed publicly. The MCP server acts as a controlled gateway that only exposes specific operations (the MCP tools), not raw Log Analytics access.

How We Secure the Public Ingress

The MCP server's public endpoint is protected by multiple layers:

Control	Implementation	Purpose
🔑 API Key	`X-API-Key` header required on all requests	Only authorized clients can call the MCP server
🔐 TLS	HTTPS encryption for all traffic	Data in transit is encrypted
🪪 Managed Identity	MCP server uses Azure MI for Log Analytics	No secrets in code or config
🛡️ RBAC	MCP server has only `Log Analytics Reader` role	Minimal permissions
📦 Scoped Tools	MCP exposes only specific operations	Can't run arbitrary commands
📝 Audit Logging	All queries logged in Container Apps	Full audit trail

Production Security Hardening

For production deployments, add these additional controls:

Security Layer	How to Implement	Impact
🪪 Entra ID Authentication	Enable Container Apps Easy Auth with Entra ID	Only authenticated users/apps can call MCP
🌐 IP Allowlisting	Restrict ingress to known SRE Agent IP ranges	Limits who can reach the endpoint
🔒 Key Vault for API Keys	Store MCP_API_KEY in Azure Key Vault	Secrets rotation, no plaintext keys
📊 Azure Monitor Alerts	Alert on unusual query patterns	Detect potential abuse
🚦 Rate Limiting	Configure Container Apps scaling limits	Prevent denial of service
🔍 Diagnostic Logging	Enable Container Apps system logs	Forensic investigation capability

📝 Click to expand: Adding Entra ID Authentication

# Enable Easy Auth with Entra ID on the Container App
az containerapp auth update \
  --name log-analytics-mcp-vnet \
  --resource-group vnet-test-rg \
  --enabled true \
  --provider aad \
  --client-id "your-app-registration-client-id" \
  --client-secret "your-app-registration-secret" \
  --issuer "https://login.microsoftonline.com/your-tenant-id/v2.0"

📝 Click to expand: IP Restriction Configuration

# Restrict ingress to specific IP ranges
az containerapp ingress access-restriction set \
  --name log-analytics-mcp-vnet \
  --resource-group vnet-test-rg \
  --action Allow \
  --ip-address 52.x.x.x/32 \
  --rule-name "SREAgentIP" \
  --description "Allow SRE Agent service IPs"

When to Use Each Pattern

Use Pure Private Link When:

Scenario	Example
All clients are in Azure VNets	Internal microservices
Clients connect via VPN/ExpressRoute	Enterprise users on corporate network
Zero public exposure is mandated	Compliance requirement for no public endpoints
You're a SaaS provider serving enterprise tenants	B2B API where all partners have Azure VNets
You control both ends	Your app calling your database

Use MCP Bridge When:

Scenario	Example
External AI agent needs to query private resources	SRE Agent investigating private Log Analytics
Can't deploy the AI client inside the VNet	Cloud-hosted AI services
Need natural language interface to private data	"Show me errors from my private VMs"
Customer wants AI assistance but has Private Link enabled	AI for security-conscious enterprise
You want investigation without VPN/Bastion access	3 AM incident response from phone

Decision Tree

                    ┌─────────────────────────────────────┐
                    │ Does the client run inside an       │
                    │ Azure VNet (or via VPN/ExpressRoute)?│
                    └────────────────┬────────────────────┘
                                     │
                    ┌────────────────┴────────────────┐
                    ▼                                 ▼
                   YES                               NO
                    │                                 │
                    ▼                                 ▼
         ┌──────────────────┐              ┌──────────────────┐
         │ Use Private Link │              │ Is the client an │
         │ (end-to-end      │              │ AI agent/tool?   │
         │  private)        │              └────────┬─────────┘
         └──────────────────┘                       │
                                      ┌─────────────┴─────────────┐
                                      ▼                           ▼
                                     YES                          NO
                                      │                           │
                                      ▼                           ▼
                           ┌──────────────────┐        ┌──────────────────┐
                           │ Use MCP Bridge   │        │ Consider:        │
                           │ (proxy pattern)  │        │ - API Gateway    │
                           └──────────────────┘        │ - Azure Front    │
                                                       │   Door + WAF     │
                                                       │ - App Service    │
                                                       │   with auth      │
                                                       └──────────────────┘

Risk Assessment

Risk	Likelihood	Impact	Mitigation
API key exposure	Low	High	Store in Key Vault, rotate regularly, use Entra ID instead
Unauthorized MCP access	Low	Medium	IP allowlisting, Entra ID auth, rate limiting
Overly permissive queries	Low	Medium	MCP tools are scoped—can't run arbitrary commands
Data exfiltration via MCP	Very Low	High	Audit logging, query result size limits, alerting
MCP server compromise	Very Low	High	Container scanning, regular updates, network policies

Bottom Line: Is This Pattern Secure?

Yes, with proper configuration.

Security Question	Answer
Is Log Analytics exposed publicly?	No—it remains Private Link-protected
Can anyone query via the MCP server?	No—API key/Entra ID required
Are queries audited?	Yes—Container Apps logs + Azure Activity Log
Is this as secure as pure Private Link?	Almost—public ingress exists but is well-protected
Is this secure enough for production?	Yes—with the hardening steps above

🔐 The pattern is secure for its intended purpose: enabling an external AI agent to query private resources without exposing those resources publicly.

🧪 Proving It Works: Blocked vs. Allowed

Let's prove that Private Link actually blocks queries and that the VNet MCP server can bypass this block.

❌ Test 1: Query from Outside the VNet (BLOCKED)

First, try querying Log Analytics from a server outside the VNet. This simulates what happens when SRE Agent tries to query directly:

📝 Click to expand: cURL command

# MCP server running OUTSIDE the VNet
curl -X POST "https://log-analytics-mcp.happydune-ad37d82a.eastus2.azurecontainerapps.io/mcp/" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "list_tables",
      "arguments": {}
    },
    "id": 1
  }'

Result: BLOCKED ❌

{
  "jsonrpc": "2.0",
  "id": 1,
  "error": {
    "code": -32603,
    "message": "InsufficientAccessError: The query was blocked due to private link 
    configuration. Access is denied because this request was not made through 
    a private endpoint..."
  }
}

✅ Test 2: Query from Inside the VNet (SUCCESS)

Now try the same query using the MCP server running inside the VNet:

📝 Click to expand: cURL command

# MCP server running INSIDE the VNet
curl -X POST "https://log-analytics-mcp-vnet.redbay-525c1802.eastus.azurecontainerapps.io/mcp/" \
  -H "Content-Type: application/json" \
  -H "Accept: application/json, text/event-stream" \
  -H "X-API-Key: your-api-key" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "list_tables",
      "arguments": {}
    },
    "id": 1
  }'

Result: SUCCESS ✅

{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      {
        "type": "text",
        "text": "Found 15 tables in the workspace:\n\n1. AzureDiagnostics\n2. ContainerAppConsoleLogs\n3. ContainerAppSystemLogs\n..."
      }
    ]
  }
}

💡 This proves the architecture works: The VNet MCP server can access the Private Link-protected workspace while external access is blocked.

🚀 Try It Yourself

Want to experience this pattern firsthand? Here's how to deploy the full demo.

📋 Prerequisites

✅ Azure subscription with Contributor access
✅ Azure CLI installed
✅ Docker (for building the MCP server image)

1️⃣ Deploy the VNet-Integrated Environment

📝 Click to expand: Azure CLI commands

# Create resource group
az group create --name vnet-test-rg --location eastus

# Create VNet
az network vnet create \
  --resource-group vnet-test-rg \
  --name vnet-test-vnet \
  --address-prefix 10.0.0.0/16 \
  --subnet-name infrastructure \
  --subnet-prefix 10.0.0.0/23

# Create VNet-integrated Container Apps environment
az containerapp env create \
  --name vnet-test-env \
  --resource-group vnet-test-rg \
  --location eastus \
  --infrastructure-subnet-resource-id "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.Network/virtualNetworks/vnet-test-vnet/subnets/infrastructure" \
  --internal-only false

# Create Log Analytics workspace
az monitor log-analytics workspace create \
  --resource-group vnet-test-rg \
  --workspace-name workspace-vnettest \
  --location eastus

2️⃣ Configure Private Link (Block Public Access)

📝 Click to expand: Azure CLI commands

# Create Azure Monitor Private Link Scope
az monitor private-link-scope create \
  --name vnet-test-ampls \
  --resource-group vnet-test-rg

# Link workspace to AMPLS
az monitor private-link-scope scoped-resource create \
  --name log-analytics-link \
  --resource-group vnet-test-rg \
  --scope-name vnet-test-ampls \
  --linked-resource "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.OperationalInsights/workspaces/workspace-vnettest"

# Set AMPLS to Private Only (block public queries)
az monitor private-link-scope update \
  --name vnet-test-ampls \
  --resource-group vnet-test-rg \
  --query-access PrivateOnly

# Disable public query access on workspace
az monitor log-analytics workspace update \
  --resource-group vnet-test-rg \
  --workspace-name workspace-vnettest \
  --set properties.publicNetworkAccessForQuery=Disabled

3️⃣ Deploy the MCP Server

📝 Click to expand: Azure CLI commands

# Create Azure Container Registry (VNet environments can't pull from public registries)
az acr create \
  --resource-group vnet-test-rg \
  --name yourregistryname \
  --sku Basic

# Build and push the MCP server image
az acr build \
  --registry yourregistryname \
  --image log-analytics-mcp:latest \
  --file Dockerfile .

# Deploy Container App with Managed Identity
az containerapp create \
  --name log-analytics-mcp-vnet \
  --resource-group vnet-test-rg \
  --environment vnet-test-env \
  --image yourregistryname.azurecr.io/log-analytics-mcp:latest \
  --target-port 8000 \
  --ingress external \
  --env-vars "LOG_ANALYTICS_WORKSPACE_ID=your-workspace-guid" "MCP_API_KEY=your-api-key" \
  --system-assigned \
  --registry-server yourregistryname.azurecr.io

# Grant Log Analytics Reader role to the Container App
az role assignment create \
  --assignee-object-id $(az containerapp show --name log-analytics-mcp-vnet --resource-group vnet-test-rg --query identity.principalId -o tsv) \
  --role "Log Analytics Reader" \
  --scope "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.OperationalInsights/workspaces/workspace-vnettest"

4️⃣ Configure SRE Agent

Add the MCP server to your SRE Agent subagent configuration:

name: PrivateVNetObservabilityAgent
description: Investigate issues in private VNet resources via MCP
system_prompt: |
  You investigate issues in private VNet environments where Log Analytics 
  queries are blocked by Private Link. Use the MCP tools to query logs.

mcp_tools:
  - LogAnalyticsMCP

tool_definitions:
  LogAnalyticsMCP:
    type: mcp
    description: Query Log Analytics in private VNet
    transport: streamable_http
    url: https://log-analytics-mcp-vnet.redbay-525c1802.eastus.azurecontainerapps.io/mcp/
    headers:
      X-API-Key: "${MCP_API_KEY}"

🤖 Configuring the Subagent in SRE Agent Portal

To use this MCP server with Azure SRE Agent, you need to configure both an MCP Connection and a Subagent. Here's the complete setup.

Step 1: Configure the MCP Connection

In the Azure SRE Agent portal, navigate to Settings → MCP Connections and add:

Field	Value
Name	`LogAnalyticsMCP`
URL	`https://log-analytics-mcp-vnet.{your-env}.{region}.azurecontainerapps.io/mcp/`
Transport	HTTP (Streamable)
Authentication	API Key
Header Name	`X-API-Key`
Header Value	Your MCP server API key

Step 2: Create the Subagent

In the Subagent Builder, create a new subagent with the following configuration:

Subagent Name

PrivateVNetObservabilityAgent

Description (Handoff Description)

This is critical for proper routing. The description tells the meta-agent when to hand off to this subagent:

Hand off to this agent when the user needs to:
- Debug resources in private VNets that can't be accessed directly
- Investigate network connectivity issues in isolated environments
- Query logs from VMs or Container Apps behind Private Link
- Analyze errors from resources without public network access
This agent uses Log Analytics MCP to access logs via Private Link.

System Prompt

You are a specialized Site Reliability Engineer focused on debugging issues with 
resources deployed in private Azure Virtual Networks.

## Your Expertise
- Troubleshooting connectivity issues in private VNets
- Analyzing logs from resources that aren't directly accessible via public internet
- Identifying network configuration problems
- Diagnosing DNS resolution failures in private networks
- Debugging Azure Container Apps deployed in private VNet environments

## Available MCP Tools
You have access to the Log Analytics MCP server which queries logs from resources
connected to Log Analytics workspaces via Private Link:

1. **query_logs** - Execute KQL queries against the Log Analytics workspace
   - Parameters: 
     - query (string, required): The KQL query to execute
     - timespan (string, optional): Time range like "PT1H", "P1D", "P7D"
   - Example: query_logs(query="ContainerAppConsoleLogs_CL | take 10", timespan="PT24H")

2. **list_tables** - Discover available log tables in the workspace
   - No parameters required
   - Returns table names and their schema

3. **get_workspace_info** - Get workspace configuration details
   - No parameters required
   - Returns workspace ID, location, retention settings

4. **analyze_errors** - Quick error trend analysis
   - Parameters: hours (integer, optional, default: 24)
   - Returns error counts and patterns

5. **check_vm_health** - Check VM heartbeat and connectivity status
   - No parameters required
   - Returns VM health status from Heartbeat table

## Troubleshooting Workflow
1. First understand the problem scope (which resources, what symptoms)
2. Use **list_tables** to discover available log sources
3. Check resource health:
   - For VMs: use **check_vm_health** or query Heartbeat table
   - For Container Apps: query ContainerAppSystemLogs_CL
4. Search for errors using **query_logs** with KQL
5. Use **analyze_errors** for quick error pattern detection
6. Correlate errors with recent deployments or changes
7. Provide actionable remediation steps

## Common Log Tables
| Table | Description |
|-------|-------------|
| Heartbeat | VM connectivity status |
| ContainerAppSystemLogs_CL | Container App system events (start, stop, scale) |
| ContainerAppConsoleLogs_CL | Container App console output (stdout/stderr) |
| AppExceptions | Application exceptions |
| AppTraces | Application traces |
| AzureDiagnostics | Azure resource diagnostics |

## Sample KQL Queries

**Recent Container App events:**
ContainerAppSystemLogs_CL 
| where TimeGenerated > ago(1h)
| project TimeGenerated, ContainerAppName_s, Type_s, Reason_s, Log_s
| order by TimeGenerated desc

**Container App errors:**
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(24h)
| where Log_s contains "error" or Log_s contains "exception"
| project TimeGenerated, ContainerAppName_s, Log_s

**VM connectivity check:**
Heartbeat
| where TimeGenerated > ago(1h)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| extend Status = iff(LastHeartbeat < ago(5m), "Disconnected", "Connected")

## Best Practices
- Always start with a time-bounded query (use timespan parameter)
- Use summarize and project to keep results manageable
- Correlate errors across multiple tables when investigating
- Provide specific, actionable recommendations
- When no data is found, suggest checking if the resource is sending logs

MCP Tools

Select the MCP connection you created:

LogAnalyticsMCP

Handoffs

Enable handoff back to the main agent:

meta_agent

Settings

Setting	Value
Temperature	0.3
Allow Parallel Tool Calls	✅ Enabled
Max Reflection Count	2

Complete YAML Configuration

For reference, here's the complete YAML that you can import into the Subagent Builder:

📋 Click to expand: Full Subagent YAML

name: PrivateVNetObservabilityAgent

system_prompt: |
  You are a specialized Site Reliability Engineer focused on debugging issues with 
  resources deployed in private Azure Virtual Networks.
  
  ## Your Expertise
  - Troubleshooting connectivity issues in private VNets
  - Analyzing logs from resources that aren't directly accessible via public internet
  - Identifying network configuration problems
  - Diagnosing DNS resolution failures in private networks
  - Debugging Azure Container Apps deployed in private VNet environments
  
  ## Available MCP Tools
  You have access to the Log Analytics MCP server which queries logs from resources
  connected to Log Analytics workspaces via Private Link:
  
  1. **query_logs** - Execute KQL queries against the Log Analytics workspace
     - Parameters: 
       - query (string, required): The KQL query to execute
       - timespan (string, optional): Time range like "PT1H", "P1D", "P7D"
     - Example: query_logs(query="ContainerAppConsoleLogs_CL | take 10", timespan="PT24H")
  
  2. **list_tables** - Discover available log tables in the workspace
     - No parameters required
     - Returns table names and their schema
  
  3. **get_workspace_info** - Get workspace configuration details
     - No parameters required
     - Returns workspace ID, location, retention settings
  
  4. **analyze_errors** - Quick error trend analysis
     - Parameters: hours (integer, optional, default: 24)
     - Returns error counts and patterns
  
  5. **check_vm_health** - Check VM heartbeat and connectivity status
     - No parameters required
     - Returns VM health status from Heartbeat table
  
  ## Troubleshooting Workflow
  1. First understand the problem scope (which resources, what symptoms)
  2. Use **list_tables** to discover available log sources
  3. Check resource health:
     - For VMs: use **check_vm_health** or query Heartbeat table
     - For Container Apps: query ContainerAppSystemLogs_CL
  4. Search for errors using **query_logs** with KQL
  5. Use **analyze_errors** for quick error pattern detection
  6. Correlate errors with recent deployments or changes
  7. Provide actionable remediation steps
  
  ## Common Log Tables
  | Table | Description |
  |-------|-------------|
  | Heartbeat | VM connectivity status |
  | ContainerAppSystemLogs_CL | Container App system events (start, stop, scale) |
  | ContainerAppConsoleLogs_CL | Container App console output (stdout/stderr) |
  | AppExceptions | Application exceptions |
  | AppTraces | Application traces |
  | AzureDiagnostics | Azure resource diagnostics |
  
  ## Sample KQL Queries
  
  **Recent Container App events:**
  ContainerAppSystemLogs_CL 
  | where TimeGenerated > ago(1h)
  | project TimeGenerated, ContainerAppName_s, Type_s, Reason_s, Log_s
  | order by TimeGenerated desc
  
  **Container App errors:**
  ContainerAppConsoleLogs_CL
  | where TimeGenerated > ago(24h)
  | where Log_s contains "error" or Log_s contains "exception"
  | project TimeGenerated, ContainerAppName_s, Log_s
  
  **VM connectivity check:**
  Heartbeat
  | where TimeGenerated > ago(1h)
  | summarize LastHeartbeat = max(TimeGenerated) by Computer
  | extend Status = iff(LastHeartbeat < ago(5m), "Disconnected", "Connected")
  
  ## Best Practices
  - Always start with a time-bounded query (use timespan parameter)
  - Use summarize and project to keep results manageable
  - Correlate errors across multiple tables when investigating
  - Provide specific, actionable recommendations
  - When no data is found, suggest checking if the resource is sending logs

handoff_description: |
  Hand off to this agent when the user needs to:
  - Debug resources in private VNets that can't be accessed directly
  - Investigate network connectivity issues in isolated environments
  - Query logs from VMs or Container Apps behind Private Link
  - Analyze errors from resources without public network access
  This agent uses Log Analytics MCP to access logs via Private Link.

# MCP tools from the LogAnalyticsMCP connection
mcp_tools:
  - LogAnalyticsMCP

# Allow handoff back to meta_agent for general queries
handoffs:
  - meta_agent

# Agent settings
temperature: 0.3
allow_parallel_tool_calls: true
max_reflection_count: 2

5️⃣ Test the Integration

Ask SRE Agent to investigate:

💬 "Show me the tables available in my private Log Analytics workspace"

💬 "Query the last 10 container logs from my application"

💬 "Check if there are any errors in the system logs"

🔧 The Architecture in Detail

Why This Works

The key is that data ingestion and query access use different network paths:

Path	Direction	Network
📥 Azure Monitor Agent → Log Analytics	Ingest	Can use Private Endpoint OR public
❌ External Query → Log Analytics	Query	BLOCKED by Private Link
✅ VNet MCP → Log Analytics	Query	ALLOWED via Private Endpoint
🌐 SRE Agent → MCP Server	Query	Public HTTPS (MCP protocol)

💡 The MCP server acts as a query proxy that runs inside the trusted network boundary.

MCP Server Configuration

The Log Analytics MCP Server is open-source and uses:

Component	Technology
🐍 Runtime	Python 3.10+ with `mcp[cli]` framework
📊 SDK	`azure-monitor-query` for Log Analytics
🌐 Transport	Streamable HTTP for remote access
🔑 Auth	Managed Identity for Azure authentication

📦 GitHub Repository: github.com/BandaruDheeraj/log-analytics-mcp-server

Available MCP Tools

Tool	Description
🔍 `query_logs`	Execute any KQL query against the workspace
📋 `list_tables`	Discover available tables and their schemas
📊 `get_workspace_info`	Get workspace metadata and data volume stats
⚠️ `analyze_errors`	Pre-built error pattern analysis
💓 `check_vm_health`	Heartbeat, CPU, memory, error count summary

🔒 Security Considerations

When deploying an MCP server as a query proxy, consider:

Concern	Mitigation
🔑 API Authentication	Use API key in header (`X-API-Key`)
🪪 Managed Identity	Container App uses system-assigned MI for Azure auth
🏠 Network Isolation	MCP server runs in VNet-integrated environment
📝 Query Auditing	All queries logged in Container Apps system logs
🛡️ RBAC	Grant only `Log Analytics Reader` role to MI

🔄 Beyond Log Analytics

This MCP proxy pattern extends to any Azure service with Private Link:

Service	Private Link	MCP Server Potential
Log Analytics	✅	Query logs via KQL
Application Insights	✅	Query app telemetry
Azure Data Explorer	✅	Query Kusto clusters
Azure SQL	✅	Query databases
Cosmos DB	✅	Query collections
Storage Accounts	✅	Access blob/table data

💡 The pattern: Deploy an MCP server inside the private network, grant it access to the resource, and expose it to SRE Agent via Streamable HTTP.

🎯 Key Takeaways

🔐 Private Link Blocks Queries

When you set publicNetworkAccessForQuery: Disabled and queryAccessMode: PrivateOnly, Log Analytics queries from outside the VNet are actually blocked. This is real security, not just data-in-transit encryption.

🌉 MCP Servers as Bridges

MCP servers running inside your VNet can act as trusted query proxies. They authenticate with Managed Identity, query the protected resource, and return results over public HTTPS.

🚀 Streamable HTTP for Remote Access

The Streamable HTTP transport (/mcp/ endpoint) enables MCP servers to be called from anywhere, not just local STDIO. This is essential for cloud-hosted MCP servers.

⚡ Zero VPN, Zero Bastion

You don't need VPN access to debug issues in private networks. SRE Agent coordinates with the MCP server, which does the actual querying from inside the trusted boundary.

🔄 Reusable Pattern

This isn't just for Log Analytics. Any Private Link-protected resource can be made accessible to AI agents through a VNet-deployed MCP server.

🏁 Get Started

Ready to bring this pattern to your environment?

Step	Action
1️⃣	Deploy a VNet-integrated Container Apps environment
2️⃣	Set up Azure Monitor Private Link Scope with `queryAccessMode: PrivateOnly`
3️⃣	Disable public query access on your Log Analytics workspace
4️⃣	Deploy the Log Analytics MCP server with Managed Identity
5️⃣	Configure SRE Agent to use the MCP server via Streamable HTTP
6️⃣	Start investigating without ever needing VPN access

📚 Resources

Resource	Link
📦 Log Analytics MCP Server	GitHub
🔗 Azure Monitor Private Link	Documentation
🐳 Container Apps VNet Integration	Documentation
🌐 MCP Streamable HTTP Transport	Specification
📖 Sample Environment	README

📝 This blog post is based on a real working demo. The MCP servers referenced (log-analytics-mcp-vnet and log-analytics-mcp) are live and demonstrate the blocked vs. allowed query behavior.

💬 Questions or feedback? Join the discussion on the Azure SRE Agent community.