Azure Kubernetes Service workloads are frequently overprovisioned. Resource requests and limits set conservatively during initial deployment often remain unchanged as workloads stabilise, resulting in nodes running at a fraction of their available capacity while incurring full compute costs.
Azure Monitor Container Insights provides the visibility needed to identify overprovisioned workloads and make data-driven right-sizing decisions. This post covers enabling Container Insights and using its reports to reduce AKS cost.
1. Enabling Container Insights
Container Insights is not enabled by default on AKS clusters. To enable it:
- Navigate to your AKS cluster > Monitoring > Insights
- If not yet enabled, select Configure monitoring
- Select the target Log Analytics workspace. Use an existing workspace to centralise container logs with other workload logs
- Select Configure
Enabling Container Insights installs the Azure Monitor Agent on the cluster's node pools and begins collecting CPU, memory, and network metrics at the container, pod, node, and cluster level.
Note that Container Insights ingestion adds to Log Analytics workspace costs. For large clusters, review the expected ingestion volume before enabling. The Cost optimisation settings option (available during configuration) allows ingestion to be limited to essential metrics only.
2. Analysing Node Utilisation
The most immediate cost signal in Container Insights is node CPU and memory utilisation. Nodes consistently running below 30% utilisation are candidates for consolidation.
Navigate to AKS cluster > Monitoring > Insights > Nodes. The Nodes view shows:
- CPU usage % per node over the selected time range
- Memory working set % per node
- A summary of the node pool VM size and count
Set the time range to Last 30 days to get a representative view of utilisation patterns rather than a point-in-time snapshot. Nodes showing average CPU utilisation below 20% across a 30-day window warrant investigation.
3. Identifying Overprovisioned Workloads at the Container Level
Navigate to AKS cluster > Monitoring > Insights > Containers. This view shows per-container CPU and memory usage relative to the configured resource requests and limits.
Containers where:
- CPU usage is consistently below 20% of the request — the request should be reduced
- Memory usage never approaches the limit — the limit can be reduced without risk
Following is a KQL query that surfaces containers with low CPU utilisation relative to their requests:
KubePodInventory
| where TimeGenerated > ago(7d)
| join kind=leftouter (
Perf
| where ObjectName == "K8SContainer"
| where CounterName == "cpuUsageNanoCores"
| summarize AvgCPU = avg(CounterValue) by InstanceName
) on InstanceName
| project ContainerName, Namespace, AvgCPU
| order by AvgCPU asc
Run this query in Log Analytics workspace > Logs to get a ranked list of containers by CPU consumption.
4. Right-Sizing Node Pools
Once individual container overprovisioning is addressed, review whether the node pool VM size remains appropriate for the adjusted workload profile.
Navigate to AKS cluster > Node pools, select a node pool, and review the VM size. Azure Advisor may also surface AKS right-sizing recommendations if the cluster has been running with consistent utilisation data for 30+ days.
For development and staging clusters, consider:
- Switching to spot node pools for non-production workloads; spot pricing on AKS node pools can reduce VM costs by 60–80%
- Enabling cluster autoscaler with a
mincount of 1 during off-hours to avoid paying for idle nodes overnight
Summary
Container Insights provides the data needed to move AKS cost conversations from estimates to evidence. Combining node-level utilisation analysis with per-container resource request reviews typically surfaces meaningful right-sizing opportunities in any cluster that has been running for more than 60 days without a deliberate review.
No comments:
Post a Comment