publish date
Mar 1, 2023
duration
28
min
Difficulty
Case details
With the accelerating technological evolution, software systems have become more distributed and complex for collaboration and orchestration. Applications are dependent on several other applications and maintaining system performance is crucial for a better user experience. Every large scale application which is primarily built on a microservice architecture which for sure requires real time monitoring and debugging. This either can't be accomplished by application error logs or can be time taking. The talk presents application performance monitoring (APM) tools and distributed tracing solutions which capture metrics for tracking performance of systems and comes handy while troubleshooting applications. This includes retrieving information such as CPU utilization, memory usage, heap/thread usage, I/O speeds, application latencies, error rates and efficiency. Here we will also be discussing how developers can configure custom application metrics, thresholds and distributed tracing using tools like new relic, lightstep, etc. New relic also has a Java agent which can monitor app servers, databases, and message queuing systems which provides all the key components of the app. For getting log traces with alerts on a real time basis, Sentry can be configured which gives us full visibility into our code, so we can catch issues before they become downtime. Let's assume that our application became inaccessible due to a particular request failure identified via log trace. In these circumstances it becomes crucial to diagnose individual web requests or transactions. Furthermore, driving resilience and measuring reliability using real time data can be achieved by measuring metrics as time series data. The talk showcases how tools like prometheus or splunk can be helpful in generating a dashboard on a real time event based system.
Share case: