publish date
Apr 11, 2023
duration
16
min
Difficulty
Case details
Many teams had to burn their fingers before realising the cruciality of proper monitoring. We analysed the incident data of over 150 organisations deploying Prometheus Alertmanager to monitor their Kubernetes infrastructure, discovered some unusually common yet fatal mistakes made when choosing metrics, and some some clever configs drastically reducing noise. This talk aims to give a run-through of best practices and 'what not to do' when choosing monitoring metrics for clean, noiseless alerting.
Share case: