Skip to content

tarantool/metrics

Repository files navigation

Build Status

Metrics

Metrics is a library to collect and expose Tarantool-based applications metrics.

Library includes:

  • four base metric collectors: Counter, Gauge, Histogram, Summary
  • ready to use Tarantool stats collectors built on top of base collectors
  • exporters to expose collected metrics in Prometheus, Graphite and generic JSON format
  • module to integrate into Tarantool Cartridge based applications

Table of contents

Installation

cd ${PROJECT_ROOT}
tt rocks install metrics

Plugins export

In order to easily export metrics to any TSDB, you can use one of the supported export plugins:

or you can write your custom plugin and use it. Hopefully, plugins for other TSDBs will be supported soon.

Metric types

There are four basic metric collectors available: Counter, Gauge, Summary and Histogram. The exact semantics of each metric follows the prometheus metric types.

Counter

Counter is a cummulative metric which value can only be incremented or reset to zero on restart. Counters are useful for accumulating number of events, e.g. requests processed, orders in e-shop. Counter is exposed as a single numerical value.

local metrics = require('metrics')

-- create a counter
local http_requests_total_counter = metrics.counter('http_requests_total')

-- somewhere in HTTP requests middleware:
http_requests_total_counter:inc(1, {method = 'GET'})

Gauge

Gauge is a metric that represents a single numerical value that can be changed arbitrarily. Gauges are useful for capturing a snapshot of the current state, e.g. CPU utilization, number of open connections. Gauge is exposed as a single numerical value.

local metrics = require('metrics')

-- create a gauge
local cpu_usage_gauge = metrics.gauge('cpu_usage', 'CPU usage')

-- register a lazy gauge value update
-- this will be called whenever the export is invoked in any plugins
metrics.register_callback(function()
    local current_cpu_usage = math.random()
    cpu_usage_gauge:set(current_cpu_usage, {app = 'tarantool'})
end)

Histogram

Histogram counts observed values into configurable buckets. Histograms are useful for tracking request latencies, processing time. Histogram is exposed as multiple numerical values:

  • the total count of observed events
  • the total sum of observed values
  • counters of observed events per bucket
local metrics = require('metrics')

-- create a histogram
local http_requests_latency_hist = metrics.histogram(
    'http_requests_latency', 'HTTP requests total', {2, 4, 6})

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency_hist:observe(latency)

Summary

Summary aggregates observed values into configurable quantiles. Summaries are useful as a service level indicator (e.g. SLAs, SLOs). Summary is exposed as multiple numerical values:

  • the total count of observed events
  • the total sum of observed values
  • number of observed events per quantile
local metrics = require('metrics')

-- create a summary with a sliding window of 5 age buckets and 60s bucket lifetime
local http_requests_latency = metrics.summary(
    'http_requests_latency', 'HTTP requests total',
    {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01},
    {max_age_time = 60, age_buckets_count = 5}
)

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency:observe(latency)

Instance health check

In production environments Tarantool Cluster usually has a large number of so called "routers", Tarantool instances that handle input load and it is required to evenly distribute the load. Various load-balancers are used for this, but any load-balancer have to know which "routers" are ready to accept the load at that very moment. Metrics library has a special plugin that creates an http handler that can be used by the load-balancer to check the current state of any Tarantool instance. If the instance is ready to accept the load, it will return a response with a 200 status code, if not, with a 500 status code.

Next steps

See:

Contribution

Feel free to send Pull Requests. To increase the chance of having your pull request accepted, make sure it follows these guidelines:

  • Title and description matches the implementation.
  • Code follows styleguide.
  • The pull request closes one or more of related issues. If not, please add an issue first.
  • The pull request contains necessary tests that verify the intended behavior.
  • The pull request contains a CHANGELOG note and documentation update if needed.

Your pull request will be reviewed in 3-5 days.

Contacts

If you have questions, please ask it on StackOverflow or contact us in Telegram:

Credits

We would like to thank Prometheus for a great API that we brusquely borrowed.