Debugging Private Networks When Public Queries Are Blocked: Azure SRE Agent + MCP
Debugging Private Network Issues When Public Queries Are Blocked
β‘ TL;DR
When Azure Private Link blocks public queries to your Log Analytics workspace, Azure SRE Agent can still investigate issuesβby delegating queries to an MCP server running inside your VNet. The MCP server queries Log Analytics from within the private network, and SRE Agent coordinates the investigation remotely.
π The Private Network Security Challenge
Modern cloud architecture often requires strict network isolation:
π‘οΈ Security teams say: "No public access to Log Analytics. Private Link with query access blocked from outside the VNet."
π§ SRE teams say: "How is our AI agent supposed to query logs if it can't reach Log Analytics?"
This creates a real problem. When you configure Azure Monitor Private Link Scope (AMPLS) with queryAccessMode: PrivateOnly and disable public query access on your Log Analytics workspace, all external queries are blockedβincluding those from Azure SRE Agent.
Try querying from outside the VNet and you'll see:
β InsufficientAccessError: The query was blocked due to private link configuration.
Access is denied because this request was not made through a private endpoint.
The question: How can an AI agent investigate issues in a workspace it can't query?
π The Solution: MCP Server Inside the VNet
The answer is the Model Context Protocol (MCP). Instead of SRE Agent querying Log Analytics directly, we deploy an MCP server inside the VNet that can query Log Analytics on the agent's behalf.

π Click to expand: Architecture Diagram (Text Version)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Private VNet (VNet-Integrated Environment) β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Azure Container Apps Environment ββ
β β (VNet-Integrated) ββ
β β ββ
β β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββ
β β β Log Analytics MCP Server β ββ
β β β (log-analytics-mcp-vnet.*.azurecontainerapps.io) β ββ
β β β β ββ
β β β Tools: β ββ
β β β βββ query_logs - Execute any KQL query β ββ
β β β βββ list_tables - Discover available tables β ββ
β β β βββ get_workspace_info - Data volume and usage stats β ββ
β β β βββ analyze_errors - Pre-built error pattern analysis β ββ
β β β βββ check_vm_health - Heartbeat, CPU, memory, error count β ββ
β β β β ββ
β β β Authentication: Managed Identity (Log Analytics Reader) β ββ
β β βββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββ ββ
β β β ββ
β ββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
β β Internal query via Private Endpoint β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β β Log Analytics Workspace ββ
β β ββ
β β Settings: ββ
β β β’ publicNetworkAccessForQuery: Disabled ββ
β β β’ AMPLS queryAccessMode: PrivateOnly ββ
β β ββ
β β Tables: Syslog, Perf, Heartbeat, ContainerLog, etc. ββ
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
β Streamable HTTP (MCP Protocol)
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Azure SRE Agent β
β (Outside the VNet) β
β β
β "Investigate the errors in my private Log Analytics workspace" β
β β
β β Calls MCP server tools over Streamable HTTP β
β β MCP server queries Log Analytics internally β
β β Results returned to agent for analysis β
β β No direct VNet access required β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π‘ Key Insight
The MCP server acts as a trusted bridge:
| Capability | Description |
|---|---|
| π Runs inside the VNet | Can query the Private Link-protected workspace |
| π Exposes HTTP endpoint | SRE Agent can call it from anywhere |
| π Uses Managed Identity | No secrets needed for authentication |
| π£οΈ Natural language to KQL | Transforms your questions into queries |
π¨ The 3 AM Incident
Let's walk through a real scenario.
π± The Alert
It's 3:17 AM. Your phone buzzes:
β οΈ CRITICAL: Payment Service - High Error Rate
π Affected:app-vm(10.0.1.10)
π Error rate: 47% (threshold: 5%)

In the old world, you'd be scrambling for VPN credentials. But you have Azure SRE Agent.
π The Investigation
You open SRE Agent from your phone (or laptopβno VPN needed) and type:
"I got an alert about high error rates on app-vm in my private network. The VM has no public IP, but logs should be in Log Analytics. Can you investigate?"
Within seconds, SRE Agent responds:

π Click to expand: SRE Agent Investigation Output
I'll query Log Analytics for recent errors from app-vm. Let me check
the Syslog and performance data...
Found 47 error entries in the last 15 minutes from app-vm.
Here's what I'm seeing:
π Error Pattern Analysis:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β’ 23 Γ "Connection to database timed out after 30000ms"
β’ 12 Γ "Failed to process transaction: Connection refused"
β’ 8 Γ "Circuit breaker OPEN for database connection pool"
β’ 4 Γ "OutOfMemoryError in transaction handler thread"
π Root Cause Indicators:
The errors suggest a cascade failure:
1. Database connectivity issues started first
2. Connection pool exhausted, circuit breaker opened
3. Memory pressure built up from queued requests
4. Application became unresponsive
Let me check the performance metrics and correlate with db-vm...
π Cross-VM Correlation
Without any additional prompting, SRE Agent digs deeper:

π Click to expand: Performance Analysis Output
π Performance Analysis (Last 30 minutes):
app-vm:
βββ Memory: 94% used (up from 45% baseline) β οΈ
βββ CPU: 78% (elevated)
βββ First error: 03:02 AM
db-vm:
βββ Disk I/O: 450 MB/s writes (unusual spike) β οΈ
βββ MySQL errors: "Too many connections", "Lock wait timeout"
βββ First error: 02:58 AM
web-vm:
βββ nginx: "upstream timed out", "502 Bad Gateway"
βββ First error: 03:05 AM
π Correlation Timeline:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
02:58 - db-vm: High disk I/O, MySQL connection issues begin
03:02 - app-vm: Database timeouts start, memory climbing
03:05 - web-vm: Upstream timeouts, 502 errors to users
03:17 - Alert triggered (47% error rate)
π‘ Root Cause:
The issue originated on db-vm with abnormal disk I/O causing
MySQL connection failures. This cascaded to app-vm (connection
timeouts, memory pressure) and finally to web-vm (502 errors).
Recommendation: Investigate what caused the disk I/O spike on
db-vm. Check for runaway queries, backup jobs, or disk issues.
β The Resolution
Based on SRE Agent's analysis, you know exactly where to focus. The issue isn't the app-vm memory (a symptom)βit's the db-vm disk I/O (the root cause).
You can even ask SRE Agent for specific KQL queries to run:
"Give me a KQL query to find the exact MySQL statements that caused the lock wait timeouts"
Syslog
| where TimeGenerated > ago(1h)
| where Computer == "db-vm"
| where SyslogMessage contains "Lock wait" or SyslogMessage contains "Deadlock"
| project TimeGenerated, SyslogMessage
| order by TimeGenerated desc

β±οΈ Time from alert to root cause identification: Under 5 minutes.
π« No VPN. π« No Bastion. π« No SSH. β Just answers.
π‘οΈ Why This Matters
Security Benefits
| Concern | How This Architecture Addresses It |
|---|---|
| π Data in transit | Private Link keeps all data on Azure backbone |
| π Public exposure | VMs have zero public IPs |
| π Access control | RBAC on Log Analytics, no direct VM access needed |
| π Audit trail | All SRE Agent queries logged in Azure Activity Log |
| β Compliance | Data never leaves your Azure region |
Operational Benefits
| π’ Old Way | β‘ New Way |
|---|---|
| VPN + Bastion + SSH | Open browser, ask a question |
| grep across 10 VMs | Single query, all VMs at once |
| Requires network expertise | Natural language queries |
| 30+ minutes to understand | Under 5 minutes to root cause |
π Security Deep Dive: MCP Bridge vs. Pure Private Link
For security-minded readers, let's be clear about the architecture, the tradeoffs, and how this pattern compares to pure Private Link solutions.
Understanding the Two Patterns
π Pure Private Link (Multitenant SaaS Pattern)
The Azure Private Link in multitenant solutions pattern is designed for SaaS providers exposing services to tenants:
| Aspect | How It Works |
|---|---|
| Direction | Tenant VNet β SaaS Provider (inbound to provider) |
| Purpose | Allow tenants to access a SaaS service privately from their VNet |
| Network Path | Tenant VNet β Private Endpoint β Provider's Load Balancer/PaaS |
| Traffic | Entirely on Azure backboneβno public internet |
| Use Case | Enterprise customers accessing SaaS from their private networks |
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Tenant A β β Tenant B β β Your SaaS β
β VNet β β VNet β β Service β
β β β β β β
β βββββββββββββββ β β βββββββββββββββ β β β
β β Private βββΌββββββββββΌββ€ Private βββΌββββββββββ€βΊ β
β β Endpoint β β Azure β β Endpoint β β Azure β β
β βββββββββββββββ β Backboneβ βββββββββββββββ β Backboneβ β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β
Entire path is privateβno public internet exposure
π MCP Bridge Pattern (Our Approach)
Our pattern is designed for a different problem: enabling an external AI agent to query resources that are (correctly!) protected by Private Link:
| Aspect | How It Works |
|---|---|
| Direction | External AI Agent β MCP Server (in VNet) β Private Resource |
| Purpose | Allow external AI to query customer's own private resources |
| Network Path | Public HTTPS β MCP Server β Private Endpoint β Log Analytics |
| Traffic | Public on ingress, private on the data plane |
| Use Case | AI-assisted investigation of Private Link-protected resources |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Customer's VNet β
β β
β βββββββββββββββββββ Private βββββββββββββββββββ β
β β MCP Server β Endpoint β Log Analytics β β
β β (Container App)βββββββββββββΊβ (Private Only) β β
β ββββββββββ²βββββββββ βββββββββββββββββββ β
β β β
βββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Public HTTPS
β (API Key + TLS)
βββββββββ΄ββββββββ
β SRE Agent β
β (external) β
βββββββββββββββββ
β οΈ Public ingress point exists, but Log Analytics remains fully private
Key Difference: The Public Ingress Point
Let's be explicit about the tradeoff:
| Aspect | Pure Private Link | MCP Bridge Pattern |
|---|---|---|
| Ingress from client | Private (VNet/VPN/ExpressRoute) | Public HTTPS |
| Data plane access | Private | Private (via Private Endpoint) |
| Works with external AI | β Noβclient must be in VNet | β Yesβthat's the purpose |
| Zero public exposure | β Yes | β οΈ MCP server has public endpoint |
Why the public endpoint?
SRE Agent runs as a cloud serviceβit's not deployed in your VNet. To enable AI-assisted investigation, we need a bridge between the external agent and the private resources. The MCP server is that bridge.
What's Protected, What's Exposed
| Component | Access Level | Why |
|---|---|---|
| Log Analytics Workspace | π Private Only | queryAccessMode: PrivateOnly enforcedβno public queries |
| MCP Server Ingress | π Public HTTPS | Required for SRE Agent to reach it |
| MCP Server β Log Analytics | π Private | Uses Private Endpoint inside VNet |
| Query Results | π Encrypted | Returned over TLS, processed by MCP server |
π‘ Key insight: Log Analytics is never exposed publicly. The MCP server acts as a controlled gateway that only exposes specific operations (the MCP tools), not raw Log Analytics access.
How We Secure the Public Ingress
The MCP server's public endpoint is protected by multiple layers:
| Control | Implementation | Purpose |
|---|---|---|
| π API Key | X-API-Key header required on all requests |
Only authorized clients can call the MCP server |
| π TLS | HTTPS encryption for all traffic | Data in transit is encrypted |
| πͺͺ Managed Identity | MCP server uses Azure MI for Log Analytics | No secrets in code or config |
| π‘οΈ RBAC | MCP server has only Log Analytics Reader role |
Minimal permissions |
| π¦ Scoped Tools | MCP exposes only specific operations | Can't run arbitrary commands |
| π Audit Logging | All queries logged in Container Apps | Full audit trail |
Production Security Hardening
For production deployments, add these additional controls:
| Security Layer | How to Implement | Impact |
|---|---|---|
| πͺͺ Entra ID Authentication | Enable Container Apps Easy Auth with Entra ID | Only authenticated users/apps can call MCP |
| π IP Allowlisting | Restrict ingress to known SRE Agent IP ranges | Limits who can reach the endpoint |
| π Key Vault for API Keys | Store MCP_API_KEY in Azure Key Vault | Secrets rotation, no plaintext keys |
| π Azure Monitor Alerts | Alert on unusual query patterns | Detect potential abuse |
| π¦ Rate Limiting | Configure Container Apps scaling limits | Prevent denial of service |
| π Diagnostic Logging | Enable Container Apps system logs | Forensic investigation capability |
π Click to expand: Adding Entra ID Authentication
# Enable Easy Auth with Entra ID on the Container App
az containerapp auth update \
--name log-analytics-mcp-vnet \
--resource-group vnet-test-rg \
--enabled true \
--provider aad \
--client-id "your-app-registration-client-id" \
--client-secret "your-app-registration-secret" \
--issuer "https://login.microsoftonline.com/your-tenant-id/v2.0"
π Click to expand: IP Restriction Configuration
# Restrict ingress to specific IP ranges
az containerapp ingress access-restriction set \
--name log-analytics-mcp-vnet \
--resource-group vnet-test-rg \
--action Allow \
--ip-address 52.x.x.x/32 \
--rule-name "SREAgentIP" \
--description "Allow SRE Agent service IPs"
When to Use Each Pattern
Use Pure Private Link When:
| Scenario | Example |
|---|---|
| All clients are in Azure VNets | Internal microservices |
| Clients connect via VPN/ExpressRoute | Enterprise users on corporate network |
| Zero public exposure is mandated | Compliance requirement for no public endpoints |
| You're a SaaS provider serving enterprise tenants | B2B API where all partners have Azure VNets |
| You control both ends | Your app calling your database |
Use MCP Bridge When:
| Scenario | Example |
|---|---|
| External AI agent needs to query private resources | SRE Agent investigating private Log Analytics |
| Can't deploy the AI client inside the VNet | Cloud-hosted AI services |
| Need natural language interface to private data | "Show me errors from my private VMs" |
| Customer wants AI assistance but has Private Link enabled | AI for security-conscious enterprise |
| You want investigation without VPN/Bastion access | 3 AM incident response from phone |
Decision Tree
βββββββββββββββββββββββββββββββββββββββ
β Does the client run inside an β
β Azure VNet (or via VPN/ExpressRoute)?β
ββββββββββββββββββ¬βββββββββββββββββββββ
β
ββββββββββββββββββ΄βββββββββββββββββ
βΌ βΌ
YES NO
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Use Private Link β β Is the client an β
β (end-to-end β β AI agent/tool? β
β private) β ββββββββββ¬ββββββββββ
ββββββββββββββββββββ β
βββββββββββββββ΄ββββββββββββββ
βΌ βΌ
YES NO
β β
βΌ βΌ
ββββββββββββββββββββ ββββββββββββββββββββ
β Use MCP Bridge β β Consider: β
β (proxy pattern) β β - API Gateway β
ββββββββββββββββββββ β - Azure Front β
β Door + WAF β
β - App Service β
β with auth β
ββββββββββββββββββββ
Risk Assessment
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| API key exposure | Low | High | Store in Key Vault, rotate regularly, use Entra ID instead |
| Unauthorized MCP access | Low | Medium | IP allowlisting, Entra ID auth, rate limiting |
| Overly permissive queries | Low | Medium | MCP tools are scopedβcan't run arbitrary commands |
| Data exfiltration via MCP | Very Low | High | Audit logging, query result size limits, alerting |
| MCP server compromise | Very Low | High | Container scanning, regular updates, network policies |
Bottom Line: Is This Pattern Secure?
Yes, with proper configuration.
| Security Question | Answer |
|---|---|
| Is Log Analytics exposed publicly? | Noβit remains Private Link-protected |
| Can anyone query via the MCP server? | NoβAPI key/Entra ID required |
| Are queries audited? | YesβContainer Apps logs + Azure Activity Log |
| Is this as secure as pure Private Link? | Almostβpublic ingress exists but is well-protected |
| Is this secure enough for production? | Yesβwith the hardening steps above |
π The pattern is secure for its intended purpose: enabling an external AI agent to query private resources without exposing those resources publicly.
π§ͺ Proving It Works: Blocked vs. Allowed
Let's prove that Private Link actually blocks queries and that the VNet MCP server can bypass this block.
β Test 1: Query from Outside the VNet (BLOCKED)
First, try querying Log Analytics from a server outside the VNet. This simulates what happens when SRE Agent tries to query directly:
π Click to expand: cURL command
# MCP server running OUTSIDE the VNet
curl -X POST "https://log-analytics-mcp.happydune-ad37d82a.eastus2.azurecontainerapps.io/mcp/" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "X-API-Key: your-api-key" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "list_tables",
"arguments": {}
},
"id": 1
}'
Result: BLOCKED β
{
"jsonrpc": "2.0",
"id": 1,
"error": {
"code": -32603,
"message": "InsufficientAccessError: The query was blocked due to private link
configuration. Access is denied because this request was not made through
a private endpoint..."
}
}
β Test 2: Query from Inside the VNet (SUCCESS)
Now try the same query using the MCP server running inside the VNet:
π Click to expand: cURL command
# MCP server running INSIDE the VNet
curl -X POST "https://log-analytics-mcp-vnet.redbay-525c1802.eastus.azurecontainerapps.io/mcp/" \
-H "Content-Type: application/json" \
-H "Accept: application/json, text/event-stream" \
-H "X-API-Key: your-api-key" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "list_tables",
"arguments": {}
},
"id": 1
}'
Result: SUCCESS β
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [
{
"type": "text",
"text": "Found 15 tables in the workspace:\n\n1. AzureDiagnostics\n2. ContainerAppConsoleLogs\n3. ContainerAppSystemLogs\n..."
}
]
}
}
π‘ This proves the architecture works: The VNet MCP server can access the Private Link-protected workspace while external access is blocked.
π Try It Yourself
Want to experience this pattern firsthand? Here's how to deploy the full demo.
π Prerequisites
- β Azure subscription with Contributor access
- β Azure CLI installed
- β Docker (for building the MCP server image)
1οΈβ£ Deploy the VNet-Integrated Environment
π Click to expand: Azure CLI commands
# Create resource group
az group create --name vnet-test-rg --location eastus
# Create VNet
az network vnet create \
--resource-group vnet-test-rg \
--name vnet-test-vnet \
--address-prefix 10.0.0.0/16 \
--subnet-name infrastructure \
--subnet-prefix 10.0.0.0/23
# Create VNet-integrated Container Apps environment
az containerapp env create \
--name vnet-test-env \
--resource-group vnet-test-rg \
--location eastus \
--infrastructure-subnet-resource-id "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.Network/virtualNetworks/vnet-test-vnet/subnets/infrastructure" \
--internal-only false
# Create Log Analytics workspace
az monitor log-analytics workspace create \
--resource-group vnet-test-rg \
--workspace-name workspace-vnettest \
--location eastus
2οΈβ£ Configure Private Link (Block Public Access)
π Click to expand: Azure CLI commands
# Create Azure Monitor Private Link Scope
az monitor private-link-scope create \
--name vnet-test-ampls \
--resource-group vnet-test-rg
# Link workspace to AMPLS
az monitor private-link-scope scoped-resource create \
--name log-analytics-link \
--resource-group vnet-test-rg \
--scope-name vnet-test-ampls \
--linked-resource "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.OperationalInsights/workspaces/workspace-vnettest"
# Set AMPLS to Private Only (block public queries)
az monitor private-link-scope update \
--name vnet-test-ampls \
--resource-group vnet-test-rg \
--query-access PrivateOnly
# Disable public query access on workspace
az monitor log-analytics workspace update \
--resource-group vnet-test-rg \
--workspace-name workspace-vnettest \
--set properties.publicNetworkAccessForQuery=Disabled
3οΈβ£ Deploy the MCP Server
π Click to expand: Azure CLI commands
# Create Azure Container Registry (VNet environments can't pull from public registries)
az acr create \
--resource-group vnet-test-rg \
--name yourregistryname \
--sku Basic
# Build and push the MCP server image
az acr build \
--registry yourregistryname \
--image log-analytics-mcp:latest \
--file Dockerfile .
# Deploy Container App with Managed Identity
az containerapp create \
--name log-analytics-mcp-vnet \
--resource-group vnet-test-rg \
--environment vnet-test-env \
--image yourregistryname.azurecr.io/log-analytics-mcp:latest \
--target-port 8000 \
--ingress external \
--env-vars "LOG_ANALYTICS_WORKSPACE_ID=your-workspace-guid" "MCP_API_KEY=your-api-key" \
--system-assigned \
--registry-server yourregistryname.azurecr.io
# Grant Log Analytics Reader role to the Container App
az role assignment create \
--assignee-object-id $(az containerapp show --name log-analytics-mcp-vnet --resource-group vnet-test-rg --query identity.principalId -o tsv) \
--role "Log Analytics Reader" \
--scope "/subscriptions/{sub}/resourceGroups/vnet-test-rg/providers/Microsoft.OperationalInsights/workspaces/workspace-vnettest"
4οΈβ£ Configure SRE Agent
Add the MCP server to your SRE Agent subagent configuration:
name: PrivateVNetObservabilityAgent
description: Investigate issues in private VNet resources via MCP
system_prompt: |
You investigate issues in private VNet environments where Log Analytics
queries are blocked by Private Link. Use the MCP tools to query logs.
mcp_tools:
- LogAnalyticsMCP
tool_definitions:
LogAnalyticsMCP:
type: mcp
description: Query Log Analytics in private VNet
transport: streamable_http
url: https://log-analytics-mcp-vnet.redbay-525c1802.eastus.azurecontainerapps.io/mcp/
headers:
X-API-Key: "${MCP_API_KEY}"
π€ Configuring the Subagent in SRE Agent Portal
To use this MCP server with Azure SRE Agent, you need to configure both an MCP Connection and a Subagent. Here's the complete setup.
Step 1: Configure the MCP Connection
In the Azure SRE Agent portal, navigate to Settings β MCP Connections and add:
| Field | Value |
|---|---|
| Name | LogAnalyticsMCP |
| URL | https://log-analytics-mcp-vnet.{your-env}.{region}.azurecontainerapps.io/mcp/ |
| Transport | HTTP (Streamable) |
| Authentication | API Key |
| Header Name | X-API-Key |
| Header Value | Your MCP server API key |
Step 2: Create the Subagent
In the Subagent Builder, create a new subagent with the following configuration:
Subagent Name
PrivateVNetObservabilityAgent
Description (Handoff Description)
This is critical for proper routing. The description tells the meta-agent when to hand off to this subagent:
Hand off to this agent when the user needs to:
- Debug resources in private VNets that can't be accessed directly
- Investigate network connectivity issues in isolated environments
- Query logs from VMs or Container Apps behind Private Link
- Analyze errors from resources without public network access
This agent uses Log Analytics MCP to access logs via Private Link.
System Prompt
You are a specialized Site Reliability Engineer focused on debugging issues with
resources deployed in private Azure Virtual Networks.
## Your Expertise
- Troubleshooting connectivity issues in private VNets
- Analyzing logs from resources that aren't directly accessible via public internet
- Identifying network configuration problems
- Diagnosing DNS resolution failures in private networks
- Debugging Azure Container Apps deployed in private VNet environments
## Available MCP Tools
You have access to the Log Analytics MCP server which queries logs from resources
connected to Log Analytics workspaces via Private Link:
1. **query_logs** - Execute KQL queries against the Log Analytics workspace
- Parameters:
- query (string, required): The KQL query to execute
- timespan (string, optional): Time range like "PT1H", "P1D", "P7D"
- Example: query_logs(query="ContainerAppConsoleLogs_CL | take 10", timespan="PT24H")
2. **list_tables** - Discover available log tables in the workspace
- No parameters required
- Returns table names and their schema
3. **get_workspace_info** - Get workspace configuration details
- No parameters required
- Returns workspace ID, location, retention settings
4. **analyze_errors** - Quick error trend analysis
- Parameters: hours (integer, optional, default: 24)
- Returns error counts and patterns
5. **check_vm_health** - Check VM heartbeat and connectivity status
- No parameters required
- Returns VM health status from Heartbeat table
## Troubleshooting Workflow
1. First understand the problem scope (which resources, what symptoms)
2. Use **list_tables** to discover available log sources
3. Check resource health:
- For VMs: use **check_vm_health** or query Heartbeat table
- For Container Apps: query ContainerAppSystemLogs_CL
4. Search for errors using **query_logs** with KQL
5. Use **analyze_errors** for quick error pattern detection
6. Correlate errors with recent deployments or changes
7. Provide actionable remediation steps
## Common Log Tables
| Table | Description |
|-------|-------------|
| Heartbeat | VM connectivity status |
| ContainerAppSystemLogs_CL | Container App system events (start, stop, scale) |
| ContainerAppConsoleLogs_CL | Container App console output (stdout/stderr) |
| AppExceptions | Application exceptions |
| AppTraces | Application traces |
| AzureDiagnostics | Azure resource diagnostics |
## Sample KQL Queries
**Recent Container App events:**
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(1h)
| project TimeGenerated, ContainerAppName_s, Type_s, Reason_s, Log_s
| order by TimeGenerated desc
**Container App errors:**
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(24h)
| where Log_s contains "error" or Log_s contains "exception"
| project TimeGenerated, ContainerAppName_s, Log_s
**VM connectivity check:**
Heartbeat
| where TimeGenerated > ago(1h)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| extend Status = iff(LastHeartbeat < ago(5m), "Disconnected", "Connected")
## Best Practices
- Always start with a time-bounded query (use timespan parameter)
- Use summarize and project to keep results manageable
- Correlate errors across multiple tables when investigating
- Provide specific, actionable recommendations
- When no data is found, suggest checking if the resource is sending logs
MCP Tools
Select the MCP connection you created:
LogAnalyticsMCP
Handoffs
Enable handoff back to the main agent:
meta_agent
Settings
| Setting | Value |
|---|---|
| Temperature | 0.3 |
| Allow Parallel Tool Calls | β Enabled |
| Max Reflection Count | 2 |
Complete YAML Configuration
For reference, here's the complete YAML that you can import into the Subagent Builder:
π Click to expand: Full Subagent YAML
name: PrivateVNetObservabilityAgent
system_prompt: |
You are a specialized Site Reliability Engineer focused on debugging issues with
resources deployed in private Azure Virtual Networks.
## Your Expertise
- Troubleshooting connectivity issues in private VNets
- Analyzing logs from resources that aren't directly accessible via public internet
- Identifying network configuration problems
- Diagnosing DNS resolution failures in private networks
- Debugging Azure Container Apps deployed in private VNet environments
## Available MCP Tools
You have access to the Log Analytics MCP server which queries logs from resources
connected to Log Analytics workspaces via Private Link:
1. **query_logs** - Execute KQL queries against the Log Analytics workspace
- Parameters:
- query (string, required): The KQL query to execute
- timespan (string, optional): Time range like "PT1H", "P1D", "P7D"
- Example: query_logs(query="ContainerAppConsoleLogs_CL | take 10", timespan="PT24H")
2. **list_tables** - Discover available log tables in the workspace
- No parameters required
- Returns table names and their schema
3. **get_workspace_info** - Get workspace configuration details
- No parameters required
- Returns workspace ID, location, retention settings
4. **analyze_errors** - Quick error trend analysis
- Parameters: hours (integer, optional, default: 24)
- Returns error counts and patterns
5. **check_vm_health** - Check VM heartbeat and connectivity status
- No parameters required
- Returns VM health status from Heartbeat table
## Troubleshooting Workflow
1. First understand the problem scope (which resources, what symptoms)
2. Use **list_tables** to discover available log sources
3. Check resource health:
- For VMs: use **check_vm_health** or query Heartbeat table
- For Container Apps: query ContainerAppSystemLogs_CL
4. Search for errors using **query_logs** with KQL
5. Use **analyze_errors** for quick error pattern detection
6. Correlate errors with recent deployments or changes
7. Provide actionable remediation steps
## Common Log Tables
| Table | Description |
|-------|-------------|
| Heartbeat | VM connectivity status |
| ContainerAppSystemLogs_CL | Container App system events (start, stop, scale) |
| ContainerAppConsoleLogs_CL | Container App console output (stdout/stderr) |
| AppExceptions | Application exceptions |
| AppTraces | Application traces |
| AzureDiagnostics | Azure resource diagnostics |
## Sample KQL Queries
**Recent Container App events:**
ContainerAppSystemLogs_CL
| where TimeGenerated > ago(1h)
| project TimeGenerated, ContainerAppName_s, Type_s, Reason_s, Log_s
| order by TimeGenerated desc
**Container App errors:**
ContainerAppConsoleLogs_CL
| where TimeGenerated > ago(24h)
| where Log_s contains "error" or Log_s contains "exception"
| project TimeGenerated, ContainerAppName_s, Log_s
**VM connectivity check:**
Heartbeat
| where TimeGenerated > ago(1h)
| summarize LastHeartbeat = max(TimeGenerated) by Computer
| extend Status = iff(LastHeartbeat < ago(5m), "Disconnected", "Connected")
## Best Practices
- Always start with a time-bounded query (use timespan parameter)
- Use summarize and project to keep results manageable
- Correlate errors across multiple tables when investigating
- Provide specific, actionable recommendations
- When no data is found, suggest checking if the resource is sending logs
handoff_description: |
Hand off to this agent when the user needs to:
- Debug resources in private VNets that can't be accessed directly
- Investigate network connectivity issues in isolated environments
- Query logs from VMs or Container Apps behind Private Link
- Analyze errors from resources without public network access
This agent uses Log Analytics MCP to access logs via Private Link.
# MCP tools from the LogAnalyticsMCP connection
mcp_tools:
- LogAnalyticsMCP
# Allow handoff back to meta_agent for general queries
handoffs:
- meta_agent
# Agent settings
temperature: 0.3
allow_parallel_tool_calls: true
max_reflection_count: 2
5οΈβ£ Test the Integration
Ask SRE Agent to investigate:
π¬ "Show me the tables available in my private Log Analytics workspace"
π¬ "Query the last 10 container logs from my application"
π¬ "Check if there are any errors in the system logs"
π§ The Architecture in Detail
Why This Works
The key is that data ingestion and query access use different network paths:
| Path | Direction | Network |
|---|---|---|
| π₯ Azure Monitor Agent β Log Analytics | Ingest | Can use Private Endpoint OR public |
| β External Query β Log Analytics | Query | BLOCKED by Private Link |
| β VNet MCP β Log Analytics | Query | ALLOWED via Private Endpoint |
| π SRE Agent β MCP Server | Query | Public HTTPS (MCP protocol) |
π‘ The MCP server acts as a query proxy that runs inside the trusted network boundary.
MCP Server Configuration
The Log Analytics MCP Server is open-source and uses:
| Component | Technology |
|---|---|
| π Runtime | Python 3.10+ with mcp[cli] framework |
| π SDK | azure-monitor-query for Log Analytics |
| π Transport | Streamable HTTP for remote access |
| π Auth | Managed Identity for Azure authentication |
π¦ GitHub Repository: github.com/BandaruDheeraj/log-analytics-mcp-server
Available MCP Tools
| Tool | Description |
|---|---|
π query_logs |
Execute any KQL query against the workspace |
π list_tables |
Discover available tables and their schemas |
π get_workspace_info |
Get workspace metadata and data volume stats |
β οΈ analyze_errors |
Pre-built error pattern analysis |
π check_vm_health |
Heartbeat, CPU, memory, error count summary |
π Security Considerations
When deploying an MCP server as a query proxy, consider:
| Concern | Mitigation |
|---|---|
| π API Authentication | Use API key in header (X-API-Key) |
| πͺͺ Managed Identity | Container App uses system-assigned MI for Azure auth |
| π Network Isolation | MCP server runs in VNet-integrated environment |
| π Query Auditing | All queries logged in Container Apps system logs |
| π‘οΈ RBAC | Grant only Log Analytics Reader role to MI |
π Beyond Log Analytics
This MCP proxy pattern extends to any Azure service with Private Link:
| Service | Private Link | MCP Server Potential |
|---|---|---|
| Log Analytics | β | Query logs via KQL |
| Application Insights | β | Query app telemetry |
| Azure Data Explorer | β | Query Kusto clusters |
| Azure SQL | β | Query databases |
| Cosmos DB | β | Query collections |
| Storage Accounts | β | Access blob/table data |
π‘ The pattern: Deploy an MCP server inside the private network, grant it access to the resource, and expose it to SRE Agent via Streamable HTTP.
π― Key Takeaways
π Private Link Blocks Queries
When you set publicNetworkAccessForQuery: Disabled and queryAccessMode: PrivateOnly, Log Analytics queries from outside the VNet are actually blocked. This is real security, not just data-in-transit encryption.
π MCP Servers as Bridges
MCP servers running inside your VNet can act as trusted query proxies. They authenticate with Managed Identity, query the protected resource, and return results over public HTTPS.
π Streamable HTTP for Remote Access
The Streamable HTTP transport (/mcp/ endpoint) enables MCP servers to be called from anywhere, not just local STDIO. This is essential for cloud-hosted MCP servers.
β‘ Zero VPN, Zero Bastion
You don't need VPN access to debug issues in private networks. SRE Agent coordinates with the MCP server, which does the actual querying from inside the trusted boundary.
π Reusable Pattern
This isn't just for Log Analytics. Any Private Link-protected resource can be made accessible to AI agents through a VNet-deployed MCP server.
π Get Started
Ready to bring this pattern to your environment?
| Step | Action |
|---|---|
| 1οΈβ£ | Deploy a VNet-integrated Container Apps environment |
| 2οΈβ£ | Set up Azure Monitor Private Link Scope with queryAccessMode: PrivateOnly |
| 3οΈβ£ | Disable public query access on your Log Analytics workspace |
| 4οΈβ£ | Deploy the Log Analytics MCP server with Managed Identity |
| 5οΈβ£ | Configure SRE Agent to use the MCP server via Streamable HTTP |
| 6οΈβ£ | Start investigating without ever needing VPN access |
π Resources
| Resource | Link |
|---|---|
| π¦ Log Analytics MCP Server | GitHub |
| π Azure Monitor Private Link | Documentation |
| π³ Container Apps VNet Integration | Documentation |
| π MCP Streamable HTTP Transport | Specification |
| π Sample Environment | README |
π This blog post is based on a real working demo. The MCP servers referenced (
log-analytics-mcp-vnetandlog-analytics-mcp) are live and demonstrate the blocked vs. allowed query behavior.
π¬ Questions or feedback? Join the discussion on the Azure SRE Agent community.