Capturing metrics for a server is essential for following reasons:
- Capacity planning
- Traffic pattern
- Server performance
A valve can be used to intercept all incoming requests and gather necessary metrics required for the server.
Following metrics should be gathered for a server.
- Total number of calls.
- Mean response time.
- Total number of error counts.
- Important API request calls.
The below flow chart describe a typical server metrics setup:
- All incoming requests are intercepted by metrics valve.
- Metrics valve will compute response time along with other metrics parameter.
- Metrics valve will forward the request to the server
- Metrics valve will intercept the response to check for response status and increment error count in case of any.
- At regular duration, metrics valve will POST data to Telemetry server.
- Telemetry server will in-turn POST data to any time-series monitoring server like Graphite
- Any any time view monitoring dashboard to check server status.
A typical Graphite dashboard:
Git link of sample tomcat metrics valve implementation: Tomcat metrics valve
Valve can be configured using tomcat’s context.xml:
<Context> ... <Valve className="org.nmn.notes.valve.TomcatMetricsValve"/> ... </Context>
Capturing metrics of a server is very essential in a production environment. Without proper metrics we will never know about sudden spikes seen in the system and the way our service responds to such spikes. To do proper capacity management we should know the total number of requests each compute of the service is currently serving and the mean response time of each request.
There are lot of other parameters that can be captured like total number of threads, heap size and any cache size used by the service which will give very good insight of the health of the system.
The answer to “how your service is doing” should be a metrics dashboard which can show all critical parameters of that service.