Metrics

Metrics is a library to collect and expose Tarantool-based applications metrics.

Library includes:

four base metric collectors: Counter, Gauge, Histogram, Summary
ready to use Tarantool stats collectors built on top of base collectors
exporters to expose collected metrics in Prometheus, Graphite and generic JSON format
module to integrate into Tarantool Cartridge based applications

Installation

cd ${PROJECT_ROOT}
tt rocks install metrics

Plugins export

In order to easily export metrics to any TSDB, you can use one of the supported export plugins:

or you can write your custom plugin and use it. Hopefully, plugins for other TSDBs will be supported soon.

Metric types

There are four basic metric collectors available: Counter, Gauge, Summary and Histogram. The exact semantics of each metric follows the prometheus metric types.

Counter

Counter is a cummulative metric which value can only be incremented or reset to zero on restart. Counters are useful for accumulating number of events, e.g. requests processed, orders in e-shop. Counter is exposed as a single numerical value.

local metrics = require('metrics')

-- create a counter
local http_requests_total_counter = metrics.counter('http_requests_total')

-- somewhere in HTTP requests middleware:
http_requests_total_counter:inc(1, {method = 'GET'})

Gauge

Gauge is a metric that represents a single numerical value that can be changed arbitrarily. Gauges are useful for capturing a snapshot of the current state, e.g. CPU utilization, number of open connections. Gauge is exposed as a single numerical value.

local metrics = require('metrics')

-- create a gauge
local cpu_usage_gauge = metrics.gauge('cpu_usage', 'CPU usage')

-- register a lazy gauge value update
-- this will be called whenever the export is invoked in any plugins
metrics.register_callback(function()
    local current_cpu_usage = math.random()
    cpu_usage_gauge:set(current_cpu_usage, {app = 'tarantool'})
end)

Histogram

Histogram counts observed values into configurable buckets. Histograms are useful for tracking request latencies, processing time. Histogram is exposed as multiple numerical values:

the total count of observed events
the total sum of observed values
counters of observed events per bucket

local metrics = require('metrics')

-- create a histogram
local http_requests_latency_hist = metrics.histogram(
    'http_requests_latency', 'HTTP requests total', {2, 4, 6})

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency_hist:observe(latency)

Summary

Summary aggregates observed values into configurable quantiles. Summaries are useful as a service level indicator (e.g. SLAs, SLOs). Summary is exposed as multiple numerical values:

the total count of observed events
the total sum of observed values
number of observed events per quantile

local metrics = require('metrics')

-- create a summary with a sliding window of 5 age buckets and 60s bucket lifetime
local http_requests_latency = metrics.summary(
    'http_requests_latency', 'HTTP requests total',
    {[0.5]=0.01, [0.9]=0.01, [0.99]=0.01},
    {max_age_time = 60, age_buckets_count = 5}
)

-- somewhere in the HTTP requests middleware:
local latency = math.random(1, 10)
http_requests_latency:observe(latency)

Instance health check

In production environments Tarantool Cluster usually has a large number of so called "routers", Tarantool instances that handle input load and it is required to evenly distribute the load. Various load-balancers are used for this, but any load-balancer have to know which "routers" are ready to accept the load at that very moment. Metrics library has a special plugin that creates an http handler that can be used by the load-balancer to check the current state of any Tarantool instance. If the instance is ready to accept the load, it will return a response with a 200 status code, if not, with a 500 status code.

Next steps

See:

A more detailed getting started guide
Metrics API reference
Detailed information on plugins

Contribution

Feel free to send Pull Requests. To increase the chance of having your pull request accepted, make sure it follows these guidelines:

Title and description matches the implementation.
Code follows styleguide.
The pull request closes one or more of related issues. If not, please add an issue first.
The pull request contains necessary tests that verify the intended behavior.
The pull request contains a CHANGELOG note and documentation update if needed.

Your pull request will be reviewed in 3-5 days.

Contacts

If you have questions, please ask it on StackOverflow or contact us in Telegram:

Credits

We would like to thank Prometheus for a great API that we brusquely borrowed.

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
.github		.github
doc		doc
example		example
metrics		metrics
rpm		rpm
test		test
tmp		tmp
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.luacheckrc		.luacheckrc
.luacov		.luacov
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
metrics-scm-1.rockspec		metrics-scm-1.rockspec

License

tarantool/metrics

Folders and files

Latest commit

History

Repository files navigation

Metrics

Table of contents

Installation

Plugins export

Metric types

Counter

Gauge

Histogram

Summary

Instance health check

Next steps

Contribution

Contacts

Credits

About

Resources

License

Stars

Watchers

Forks

Languages