Is Azure Down? Microsoft Azure Status Guide & Diagnostics 2026
Is Microsoft Azure Down Right Now?
Microsoft Azure is the second-largest cloud platform globally, powering enterprise workloads across App Service, AKS, Azure SQL, Event Hubs, Cosmos DB, and hundreds of other services. Azure outages can cascade across dependent services and affect production workloads across regions.
Here's how to check Azure status in seconds and diagnose whether you're facing a platform issue or a configuration problem.
Step 1: Check Azure's Status Pages
Azure has two status interfaces — you need both:
1. Azure Status Page (all customers)
azure.status.microsoft — shows active incidents and a global heatmap by service and region. Updated continuously during incidents.
2. Azure Service Health (your specific resources)
Azure Portal → Service Health — shows incidents affecting your specific subscriptions, regions, and services. This is more accurate than the public status page for your workloads because not every incident affects every customer.
Set up Service Health alerts
# Create a Service Health alert via Azure CLI
az monitor activity-log alert create \
--name "azure-service-health-alert" \
--resource-group my-rg \
--scope "/subscriptions/$(az account show --query id -o tsv)" \
--condition category=ServiceHealth \
--action-group my-action-group
RSS feeds:
- azure.status.microsoft/en-us/status/feed/ — general status
Azure Service Architecture: What Can Break
| Service | What it does | Common failure modes |
|---|---|---|
| App Service / Web Apps | Managed web hosting (Linux/Windows) | Deployment failures, SSL binding errors, CORS issues, custom domain validation |
| Azure Kubernetes Service (AKS) | Managed Kubernetes | Control plane API unavailable, node pool upgrade stuck, CNI plugin failures |
| Azure Functions | Serverless compute | Cold start timeouts, trigger binding errors, deployment package too large |
| Azure SQL Database | Managed SQL Server (PaaS) | Connection pool exhaustion, DTU/vCore throttling, failover to geo-replica |
| Cosmos DB | Multi-model NoSQL database | RU/s throughput exceeded (429), regional failover, consistency level issues |
| Event Hubs / Service Bus | Managed messaging | Throughput unit exhaustion, namespace failover, consumer group lock conflicts |
| Azure Blob Storage | Object storage | Storage account throttling, geo-redundancy failover, access tier transitions |
| Azure Active Directory / Entra ID | Identity and authentication | Token endpoint delays, MFA push failures, conditional access evaluation errors |
| Azure OpenAI Service | Hosted OpenAI models | Token rate limits (429), multi-region capacity exhaustion (see Q1 2026 incident) |
| Azure Container Registry (ACR) | Private container image registry | Pull failures during deployments, geo-replication lag |
Is It Azure or Your App? Diagnostic Table
| Symptom | Most likely cause | How to verify |
|---|---|---|
| App Service returns 503 | App crash, scaling limits, or App Service plan throttled | Check App Service → Diagnose and Solve Problems → Availability |
| AKS kubectl commands fail | Control plane API unavailable (Azure incident) OR RBAC/credentials issue | az aks get-credentials and retry; check Azure status for AKS in your region |
| Azure Functions not triggering | Storage account issue (Functions uses Storage internally), trigger binding error | Check Functions → Monitor for invocation errors; verify AzureWebJobsStorage connection |
| Azure SQL: "connection timeout" | DTU/vCore exhaustion, max connections reached, or geo-failover in progress | Check Azure SQL → Metrics → DTU percentage; look for high wait stats |
| Cosmos DB: 429 Too Many Requests | RU/s limit reached (not an outage — provisioned throughput exceeded) | Check Cosmos DB → Insights → Rate of requests throttled; increase RU/s or use autoscale |
| Event Hubs: consumers not receiving messages | Throughput unit limit, consumer group offset issue, or namespace geo-failover | Check Event Hubs → Metrics → Throttled Requests; verify consumer group checkpoint |
| Azure AD: login failures for users | Entra ID / Azure AD incident (can be global) | Check azure.status.microsoft for Azure Active Directory / Entra |
| Azure OpenAI: 503 errors | Regional capacity exhaustion or quota exceeded | Check your quota in Azure Portal → Azure OpenAI → Quotas; try different region |
| Deployment fails with "ResourceNotFound" | ARM template issue, resource provider not registered, or quota limit | Check Azure Portal → Activity Log for deployment errors |
| Works in West Europe, fails in East US | Region-specific Azure incident | Check status page for the specific region; deploy to a secondary region |
Q1 2026 Azure Incidents (Public Record)
Microsoft publishes detailed Post-Incident Reviews (PIRs) at azure.status.microsoft/en-us/status/history/. Notable Q1 2026 incidents from public reporting:
| Date | Service | Duration | Description | Source |
|---|---|---|---|---|
| March 9–10, 2026 | Azure OpenAI Service | ~20 hours | Multi-region degradation affecting Azure OpenAI 5.2 models across Australia East, Sweden Central, Central US, East US 2, Korea Central, Norway East, UK South. HTTP 400/429 errors. Root cause: unexpected resource exhaustion triggered by a configuration change. Source: Azure status history | azure.status.microsoft |
| February 2026 | Azure Virtual Machines | Multiple hours | Virtual machine outage with cascading effects on dependent Azure services. Reported by The Register | azure.status.microsoft |
All data sourced from Microsoft's public status history and published incident reports. For the full list of Q1 2026 incidents with root cause analyses, check azure.status.microsoft/en-us/status/history/ directly.
Azure Resilience Patterns
1. Availability Zones vs. Regions
Most Azure regions offer 3 Availability Zones (physically separate datacenters within a region). Zone-redundant deployments survive a single AZ failure. However, they don't protect against:
- Region-wide control plane issues (API management, ARM)
- Global services like Azure AD / Entra ID
- Configuration changes applied globally (like the March 2026 Azure OpenAI incident)
2. Azure Paired Regions
Azure pairs regions for geo-redundant replication (e.g., East US ↔ West US, North Europe ↔ West Europe). Critical services:
- Azure SQL geo-replication uses paired regions by default
- Geo-redundant storage (GRS) replicates to paired region
- Azure Site Recovery uses paired regions for DR
3. Retry logic for transient errors
import time
import random
def azure_retry(func, max_retries=5, base_delay=1.0):
"""
Exponential backoff with jitter for Azure SDK calls.
Handles 429 (throttle), 503 (unavailable), and transient 5xx errors.
"""
for attempt in range(max_retries):
try:
return func()
except Exception as e:
error_code = getattr(e, 'error_code', None) or str(e)
# Don't retry on permanent errors
if '404' in str(error_code) or '403' in str(error_code):
raise
if attempt == max_retries - 1:
raise
# Exponential backoff with jitter
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
# Honor Retry-After header if present (Azure 429s include it)
retry_after = getattr(e, 'retry_after', None)
if retry_after:
delay = max(delay, float(retry_after))
time.sleep(delay)
raise Exception("Max retries exceeded")
Azure CLI Quick Diagnostic Commands
# Check App Service health
az webapp show --name my-app --resource-group my-rg --query "state"
# Check AKS cluster state
az aks show --name my-cluster --resource-group my-rg --query "powerState"
# List recent AKS node pool events
az aks get-credentials --name my-cluster --resource-group my-rg
kubectl get events --all-namespaces --sort-by='.lastTimestamp' | tail -20
# Check Azure SQL database status
az sql db show --name my-db --server my-server --resource-group my-rg \
--query "[status, currentServiceObjectiveName]"
# Check Azure Functions recent invocations
az functionapp list-functions --name my-func-app --resource-group my-rg
# View activity log (deployment errors)
az monitor activity-log list --resource-group my-rg \
--start-time $(date -u -v-2H +%Y-%m-%dT%H:%M:%SZ) \
--query "[?level=='Error'] | [].{time:eventTimestamp, op:operationName.value, status:status.value}"
External Monitoring for Azure
Azure Monitor's built-in availability tests run from within Azure infrastructure — they can miss failures that only appear when traffic routes through the global load balancer or comes from outside Azure's network. Use an external monitoring service like ezmon.com to check your Azure-hosted services from multiple independent geographic locations.
External monitoring catches:
- Azure Front Door / Traffic Manager routing failures
- DNS propagation issues from Azure DNS
- SSL certificate problems (expired or mis-issued certs)
- Geographic routing errors (Azure Traffic Manager weighted/priority policies)
Azure Status Quick Reference
- Status page: azure.status.microsoft
- Incident history + PIRs: azure.status.microsoft/en-us/status/history/
- Personal Service Health: Azure Portal → Service Health
- RSS: azure.status.microsoft/en-us/status/feed/
- @AzureSupport on X: for active incident updates
- Azure documentation: Availability monitoring guide
Related Guides
- Azure Q1 2026 Reliability Analysis: Key Incidents and Lessons
- Is AWS Down? Amazon Web Services Status Guide
- Is GCP Down? Google Cloud Platform Status Guide
- Kubernetes Cluster Issues: Full Diagnostic Guide
- AWS Lambda / RDS / EC2 Service Diagnostics
Bottom Line
When Azure shows symptoms:
- Check azure.status.microsoft AND Azure Portal → Service Health
- Service Health is more accurate — it filters to your specific subscriptions and regions
- Use
az monitor activity-log listto see recent deployment/operation errors - Check resource-specific metrics (DTU%, RU/s, throughput units) before assuming it's Azure
- If confirmed Azure incident: implement regional failover or wait for remediation
Monitor your Azure-hosted services from outside Azure at ezmon.com — multi-location uptime monitoring that catches what Azure Monitor misses.