As I continued to work on and test different data loading techniques using TimescaleDB and Bureau of Labor Statistics data, it dawned on me that I was not collecting any good metrics on all of this work I was doing. Sure I could program metrics into the data loaders and such, but I am talking about system metrics from the VMs doing the work. Things like CPU usage, memory usage, network usage, and all of those other metrics that we all love so much.
Since I am currently focused on PostgreSQL, I do not have a running Elasticsearch cluster, but I have a lot of hosts that still have Metricbeat installed. For the past five years, I have used Elasticsearch and Metricbeat to collect metrics on my projects and live servers. Without a running Elasticsearch cluster, I paused the first project and pondered how I should go about collecting system metrics for my current and future projects.
There are many options available for collecting system metrics. There are also many different ways to store these metrics, from flat text to rrdtool and its whole ecosystem of configuration options. There is Prometheus and its tools, InfluxDB, Cassandra, and many more. So, where I could have spent days looking for something I wanted to work with, I decided to bring this full circle back to what I was working on anyway; Postgresql and TimescaleDB. But how to collect the metrics? I do not want to spend days learning something new. So after some deep mind searching, I decided to stick to what I already knew and work something out to convert Metricbeat JSON data into a format compatible with my current PostgreSQL work. Those thoughts spawned this project.
When I started this side project I had great ambitions that I was going to reinvent the wheel. I wrote the first version of this with JavaScript and NodeJS. The idea was to use JavaScript to emulate the communication between a Redis server and a client. I worked on that version for three days before figuring out that TCP buffers are not easy to manage, and I was losing 10% of my records because of truncated documents left in the buffer. Which, of course, became a much worse problem as I started pushing more metrics from more hosts. Of course, there are ways to buffer the buffers, but then it dawned on me that I would lose all metrics currently in that buffer if the script crashed. Losing a buffer full of metrics is never a good thing, so I moved on to version 2 using Perl.
Version two was a little less ambitious but still buggy. The initial idea was to hold all values I wanted to INSERT in a Perl HASH until all values were filled over a specific timestamp and then insert them into a PostgreSQL wide table. I loved the idea of being able to store everything I wanted in a wide table, but there was a problem here also. As I turned up more hosts, the hash quickly grew, and I realized it was not much different than before. If the script crashed, it would lose all metrics currently in memory. Also, I would have to write hash cleanup routines otherwise the script would have a memory leak that could grow quickly if many hosts were being monitored.
Now we have landed at this final version. This version could use a lot of tweaks and additions, but for the time being, it works for my needs and shows the basics of converting the deep nested Metricbeat data into SQL inserts. This version uses Redis as a metric buffer but is a far cry from what the first version tried to do. In this version the idea of using the wide table format was modified into what I will call a medium table format. We will get into the details on that when we get to that section of this project document.
Before starting this project I knew bits and pieces of how to do parts of these things. This project document wraps years of knowledge, lots of research, many nights of testing code, testing queries, and the comprehension to put it all together into one document. As problem solvers, this is what we do, with the hope that our hard work and research all combined into one well documented project will help others further their knowledge.
Follow along as we build this project and learn a little bit about converting deep nested JSON documents into SQL style inserts.
Let's Get Going!
I chose these components because I am very familiar with them. I have used each of these for at least a handful of years, except for TimescaleDB. TimescaleDB is a set of extensions that you install on top of PostgreSQL. These extensions do some crafty work in the background to take advantage of many PostgreSQL concepts that can be intimidating to even a seasoned database engineer. Things like table partitioning, automatic data aggregation and complex queries, just name a few things I have figured out so far.
At one time, most of the databases running Internet applications and collecting metrics were SQL based. Many SQL-based options are available, and now with the fast rise of NoSQL databases, there are many other options to consider for processing metrics. I have used both SQL and NoSQL for many different tasks. They both have their upsides and downsides, but everything is use-case dependent in the end. This project is designed to take the SQL route, not define which route is better.
The PostgreSQL database that we will build has two tables. One is a relational table that holds metric descriptions and names. The other will be a TimescaleDB hypertable that holds the metrics. We will work through building all of these from a VM with a fresh install of Ubuntu Linux 20.04 all the way to a working metrics converter and display platform.
This project makes a few assumptions about skill level. This project assumes that you have a freshly installed Ubuntu 20.04 server, VM or otherwise, and that you know how to navigate a Linux command line. Other than that, this project document covers everything else that you will need to know to complete the project.
Since this project will only be collecting metrics from one host, we can use a fairly low-resource platform for this project. My platform, as shown below, has four processor cores, 4 gigabytes of RAM, and 50 gigabytes of virtual storage running on a 1TB NvME. Whether running as a local VM, a cloud-based VM, or a bare metal server, these resources should work fine. Below is a screenshot of the Proxmox VM creation GUI showing what I chose for the resources.
This pretty much covers the hardware needed to run this project. The 50 gigabytes of storage is a little more than what will be required, but it is a good buffer if someone decides to push data from other hosts into this system. We will discuss how to do that when we get to the Redis configuration section.
There are many cloud services available where you can lease a VM with these resources for a nominal fee. If you are only going to use a leased VM long enough to run through this project once, then you only need to lease it for a few hours. In most cases, this costs less than one US dollar.
We need to install some core packages that will be needed as we work through this project. I pondered a few different ways to handle installing packages, from very detailed explanations of every command to offering just a blob of commands and moving on from there. The first way added a dozen pages to the document. The second way offered no detail on what was being installed. Neither of them are a good answer. With this in mind, I came up with this hybrid format. These will be in a numbered list with the summary of what the command does and then the command.
This first set of installs takes care of updating the operating system to the latest packages and installing the PostgreSQL database, the client, and the TimescaleDB extensions.
As of October 2022, the install instructions for the PostgreSQL and TimescaleDB packages presented here are slightly different than what is offered on the Timescale website. The official instructions state that you must be logged in as root to follow these procedures. The list below has been updated and tested by me to allow this installation without having to be logged in as root.
sudo apt update
sudo apt upgrade
sudo apt install gnupg postgresql-common apt-transport-https lsb-release wget apt-transport-https software-properties-common
sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
echo "deb https://packagecloud.io/timescale/timescaledb/ubuntu/ $(lsb_release -c -s) main" | sudo tee /etc/apt/sources.list.d/timescaledb.list
wget --quiet -O - https://packagecloud.io/timescale/timescaledb/gpgkey | sudo apt-key add -
sudo apt update
sudo apt install timescaledb-2-postgresql-14
sudo timescaledb-tune
sudo systemctl restart postgresql
sudo su postgres -c psql
After completing these 11 steps, we should have a fully installed PostgreSQL database with the TimescaleDB extensions and the PostgreSQL database client. If all is well, we should now see an output that looks like the one below.
psql (14.5 (Ubuntu 14.5-1.pgdg20.04+1))
Type "help" for help.
postgres=#
This shows us that the PostgreSQL client can connect to the database. These steps that we just completed are the most complex part of the core component installations. In the end, these 11 steps install two repositories, three core packages, and all their dependencies. Now we will move on to our metric buffer, Redis.
For this project, Redis is a very simple install. We do not need to install any keys or repositories as the version supplied by default using Ubuntu 20.04 apt-get works very well for this project. Run the command below to install the Redis server.
sudo apt-get install redis
This package also installs redis-cli, the default Redis database client. This package starts the redis server after installation. We can verify that it is running by running the command below. We do not need to run sudo for redis-cli in its default setup. So just redis-cli.
redis-cli
If there are no errors, you should see the output below after running redis-cli.
127.0.0.1:6379>
This is the Redis database prompt. We will use this a few times while working through this project, mainly to check the status of the data we are collecting. The next installation is Metricbeat. Metricbeat requires repository additions just as PostgreSQL and TimescaleDB did.
The documentation offered by Elastic at this URL Repositories for APT and YUM explains these steps very well. I have included a slightly modified version of the instructions here so that this project can be completed with this one document. The modification is that we will not be trying to reinstall packages we already installed earlier, such as, apt-transport-https. This was designed around Metricbeat version 8.4.3.
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
echo "deb https://artifacts.elastic.co/packages/oss-8.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list
sudo apt-get update
sudo apt-get install metricbeat
Metricbeat will not start after the installation as PostgreSQL and Redis did. Metricbeat needs to be configured before we can start it. We will start on configurations after installing these last few core packages. Let's now move on to installing Grafana.
We will install the latest version of Grafana over these next few steps. Grafana also requires adding a repository and a GPG key. So we will go through very similar steps as what we did with Metricbeat. We will use a modified version of the steps listed in the Grafana documentation at Install on Debian or Ubuntu. The guide here will be a few steps simpler as we have already installed some of the components mentioned in the official document. The steps are broken down below.
sudo wget -q -O /usr/share/keyrings/grafana.key https://packages.grafana.com/gpg.key
echo "deb [signed-by=/usr/share/keyrings/grafana.key] https://packages.grafana.com/oss/deb stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
sudo apt-get update
sudo apt-get install grafana
sudo service grafana-server start
Since Grafana uses a web interface, the easiest way to see if it is running is to use a web browser and navigate to the web interface. As we used default settings, the URL will be http://server-ip:3000/. If you do not already know it, you can find the IP of the project server by running the ip route command. This command and its output are shown below.
bvest@project-beat-interceptor:~$ ip route
default via 192.168.2.1 dev ens18 proto dhcp src 192.168.2.196 metric 100
192.168.2.0/24 dev ens18 proto kernel scope link src 192.168.2.196
192.168.2.1 dev ens18 proto dhcp scope link src 192.168.2.196 metric 100
In the case of my project server, the IP address is 192.168.2.196, which is shown as the second IP address in the first entry, "default". So we would navigate to http://192.168.2.196:3000. If everything is working, your browser should show the window below.
We will log into this later in this document, so you can close this browser window now. Now we move on to the last part of the core parts of this project, the Perl modules.
I chose Perl for this project for a few reasons. It does this kind of work well, I have been writing Perl script for around 15 years, and I have used it as the glue code in many systems similar to what we are doing with this project. It is very solid and is pre-installed on most Linux distributions. I may rewrite this in Go or another compiled language someday, but for now, this Perl version does what I need it to do. Though Perl is installed by default with Ubuntu Server 20.04, we need to install a few modules that make this kind of processing much easier.
All of the modules that we need can be installed using apt-get, though the CPAN system is quite useful if you write a lot of Perl code. Lets get to it, as these are the last core components that need to be installed.
sudo apt-get install libredis-perl
sudo apt-get install libdbi-perl
sudo apt install libdbd-pg-perl
These are the last core components that can be installed with the apt-get system. The next steps take us through downloading the repository containing the custom files and script for this project and the configuration of all the pieces we just installed.
Though we will step through the configurations and the script that runs this project, the whole project is in a repo on GitHub. We will use the files in this repo to copy the new configurations into place and finally run the script. Since git is installed by default, we will use that route to download this project.
Navigate to your home folder on this project machine and download the repo using the commands below.
cd ~
git clone https://github.com/b-vest/project-metricbeat-to-postgresql-with-perl.git
This git clone will have created a new folder named project-metricbeat-to-postgresql-with-perl, lets navigate to that folder and see the files that we will be working with.
cd project-metricbeat-to-postgresql-with-perl
ls -al
Out of the listing of files that is printed, the ones we need for this project are listed and explained below.
This brings us to the end of what we must install and download. Now we move on to building some database tables, configuring Metricbeat, and loading the field_names table. We will first configure Metricbeat and get it running so our Redis buffer can start collecting some metrics on the system. Having the metrics already in the Redis buffer will give us something to work with by the time we get back to Grafana and start working with the database queries.
Including the configurations in the repo make this project much easier to set up. For all of these configurations, we will first see the command to copy the configuration into the correct place and then a summary of what was changed in those configurations. Let's ensure we are in the project folder and run the commands listed below.
sudo cp ./project-file-metricbeat.yml /etc/metricbeat/metricbeat.yml
This step copies the metricbeat.yml from the project folder to where metricbeat stores its configuration information at /etc/metricbeat. What was copied into that folder is shown below.
metricbeat.config.modules:
# Glob pattern for configuration loading
path: ${path.config}/modules.d/*.yml
# Set to true to enable config reloading
reload.enabled: false
output.redis:
hosts: ["localhost:6379"]
bulk_max_size: 6
backoff_max: 1s
#datatype: "channel"
max_retries: 10
#password: "thisisnotasecurekey"
key: "metricbeat"
# db: 0
# timeout: 15
# ================================= Processors =================================
# Configure processors to enhance or manipulate events generated by the beat.
processors:
- drop_fields:
fields: ['agent','ecs','service']
#- add_host_metadata: ~
#- add_cloud_metadata: ~
#- add_docker_metadata: ~
#- add_kubernetes_metadata: ~
This custom configuration is a very stripped-down configuration versus the Metricbeat default configuration. This configuration only shows what we are actually using and a few commented out items for future use. Our only output is Redis at localhost:6379, which is where Redis listens by default. The one drop_fields: process is to remove fields that we do not need for this project.
Copy this metrics module into the metricbeat modules.d folder.
sudo cp ./project-file-redis.yml /etc/metricbeat/modules.d/redis.yml
This file is copied to nearly the same location but we added the modules.d folder. This is the Metricbeat module that collects metrics from the Redis server. All modules go into the /etc/metricbeat/moduels.d/ folder by default. Though this can be changed in the core metricbeat.yml configuration, for this project it is fine where it is.
# Module: redis
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/8.4/metricbeat-module-redis.html
- module: redis
metricsets:
- info
- keyspace
period: 10s
# Redis hosts
hosts: ["127.0.0.1:6379"]
# Network type to be used for redis connection. Default: tcp
#network: tcp
# Max number of concurrent connections. Default: 10
#maxconn: 10
# Redis AUTH password. Empty by default.
#password: foobared
The only change between this module and the stock redis.yml module is enabling the metricsets. Since we did not set up a redis password, nothing else needs to be changed.
Note: Redis is only secure without a password because for this project it only responds to localhost. If Redis is bound to anything other than localhost it should always have a password.
Copy this metrics module into the metricbeat modules.d folder.
sudo cp ./project-file-system.yml /etc/metricbeat/modules.d/system.yml
Similar to the redis.yml above this file was copied to /etc/metricbeat/modules.d/system.yml. What was copied is below.
# Module: system
# Docs: https://www.elastic.co/guide/en/beats/metricbeat/8.4/metricbeat-module-system.html
- module: system
period: 10s
metricsets:
- cpu
- load
- memory
- network
#- process
#- process_summary
#- socket_summary
#- entropy
#- core
- diskio
#- socket
#- service
#- users
process.include_top_n:
by_cpu: 5 # include top 5 processes by CPU
by_memory: 5 # include top 5 processes by memory
# Configure the mount point of the host’s filesystem for use in monitoring a host from within a container
# hostfs: "/hostfs"
- module: system
period: 1m
metricsets:
- filesystem
- fsstat
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: '^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)'
- module: system
period: 15m
metricsets:
- uptime
#- module: system
# period: 5m
# metricsets:
# - raid
# raid.mount_point: '/'
This module configuration is a bit larger than the one for Redis, as it can collect information about many parts of the system. For this project, I have disabled all process and socket monitoring and enabled diskio monitoring. The poller script should work with any of these options enabled, but these are the only ones we will use in this project.
With all of these things complete we can now start Metricbeat and start collecting data about our project system.
sudo service metricbeat start
The easiest way to see if we are collecting metrics from Metricbeat is to check if our Redis metrics buffer is collecting any documents. Let's connect to Redis with the redis-cli utility and see if we have any documents.
redis-cli
Once connected lets run the Redis command LLEN metricbeat. This commamnd stands for List Length, the metricbeat part tells Redis which key to count. By default the Mericbeat redis module writes into a key named metricbeat. The output should be an integer similar to what is shown below.
127.0.0.1:6379> llen metricbeat
(integer) 18
This output shows that at this time, Redis is buffering 18 metric records that will be processed when we first start the polling script. This value will grow as we work through the rest of this document which is great because we will have a few metrics to build visualizations around when we get everything in place. Since we are using the default 10-second poll cycle, this record count will grow quickly.
Now we have data in one place, but it is still not usable by Grafana. We need to convert this data from JSON to SQL Inserts and send it to PostgreSQL. Before we can do that, we need to create the database, table, and users that we will use for the PostgreSQL side of this work. We also need to install the TimescaleDB extensions on our database and turn one of our tables into a hypertable. We move on to those steps in the next section.
Now that we have data in Redis, we need to build the PostgreSQL and TimescaleDB parts. These tables will define and store the metrics currently collected by Metricbeat and stored in our Redis buffer. First, we need to reconnect to the database as we did earlier when we first set up PostgreSQL. Let's do that with the command below. You can either do this in a new terminal to the project server or exit from the redis-cli utility and run these commands in that same terminal.
sudo su postgres -c psql
Now that we are connected to the PostgreSQL database we need to create our project database. Run the commands below to create the project database and then connect to it.
CREATE DATABASE metrics_ht;
\c metrics_ht
If everthing has worked well to this point we should see the message below after connecting to the new database.
postgres=# \c metrics_ht
You are now connected to database "metrics_ht" as user "postgres".
metrics_ht=#
The first thing that we need to do after connecting to this new database is install the TimescaleDB extensions. All TimescaleDB functions run as parts of the query, as everything is tightly bound with PostgreSQL. Below is the query to install the extensions on our newly created metrics_ht database.
CREATE EXTENSION IF NOT EXISTS timescaledb;
If everything goes well you will be presented with a screen that looks like the one below.
WARNING:
WELCOME TO
_____ _ _ ____________
|_ _(_) | | | _ \ ___ \
| | _ _ __ ___ ___ ___ ___ __ _| | ___| | | | |_/ /
| | | | _ ` _ \ / _ \/ __|/ __/ _` | |/ _ \ | | | ___ \
| | | | | | | | | __/\__ \ (_| (_| | | __/ |/ /| |_/ /
|_| |_|_| |_| |_|\___||___/\___\__,_|_|\___|___/ \____/
Running version 2.8.1
For more information on TimescaleDB, please visit the following links:
1. Getting started: https://docs.timescale.com/timescaledb/latest/getting-started
2. API reference documentation: https://docs.timescale.com/api/latest
3. How TimescaleDB is designed: https://docs.timescale.com/timescaledb/latest/overview/core-concepts
Note: TimescaleDB collects anonymous reports to better understand and assist our users.
For more information and how to disable, please see our docs https://docs.timescale.com/timescaledb/latest/how-to-guides/configuration/telemetry.
CREATE EXTENSION
If you see this then everything is good and the extensions are installed. Now we can move on to creating the tables.
Now we are ready to create the tables. We will create two tables named field_names and metric_values. field_names will hold metric field names and the definitions of those names. metric_values will hold the real-time metrics that we pull from Redis and parse into SQL inserts. Let's create the field_names table first. Run the complete command listed below in the terminal where we just created the metrics_ht database.
CREATE TABLE "public".field_names (
field_id serial NOT NULL ,
field_name varchar(100) ,
description varchar ,
CONSTRAINT pk_field_names PRIMARY KEY ( field_id ),
CONSTRAINT unq_field_names UNIQUE ( field_name )
);
This table is our static relational table. This table does not update automatically as the metric_values table will. We load this data set using the poller script and manually refresh the data if any metric values are added or changed as Metricbeat updates. This project is designed around version 8.4.3, but these fields rarely change.
Below I will break down each variable and constraint in the order that they are shown in the query.
Now we move on to the table that will hold all of our metrics and be a hypertable, metric_values
CREATE TABLE "public".metric_values (
"timestamp" timestamp NOT NULL ,
field_id integer ,
hostname varchar(100) ,
device varchar NOT NULL ,
"value" numeric(40,6)
);
This table is our real time metrics collector. This is where all of the data that we process out of our Redis buffer ends up. Below I describe all of the fields as I did with field_names above.
The device column: As we collect metrics from many different sources in this project the devices column is ambiguous. In most circles, devices are things such as disk drives, network controllers, memory controllers, and so on, which are denoted by their device name in the metrics. Things such as sda for disk storage and ens18 or eth0 for network controllers. This device field goes beyond that and is used to identify any metric as to where it came from. If you look at things like memory, cpu, or system load, the device will be system as the system is the device. In the case of monitoring Redis, the system will be redis since that is the only process sending metrics. When gathering metrics for databases such as PostgreSQL and MySQL, the device column can potentially be the name of the database if you are collecting per-database metrics.
These are the only tables we need for this project. Now we need to add a couple things to the metric_values table. First we will add an index for the timestamp and field_id fields as these will be the most heavily searched.
CREATE INDEX idx_metric_values ON "public".metric_values ( "timestamp", field_id );
The response from this command will be a simpe CREATE INDEX if there were no errors. Now on to the TimescaleDB hypertable.
Hypertables are one of the core features of TimescaleDB. This chunked table structure allows data in one table to be sliced into many different chunks. A hypertable can outperform a standard table by many factors when designed correctly. Many variables come into play when designing the hypertable layout. These range from ingest rate and query rate to query span, document size, and the number of simultaneous users, among other factors. There are many ways to do this and many different use cases for how the hypertable should be created. The options for creating hypertables are documented very well at the Timescale website in this document, create_hypertable.
Though I can not think of a production environment where a ten-minute hypertable chunk is a good idea, it is a good idea for this project so we can take a peek at the show_chunks command later. Heed the warning about not doing this in production and run the command below to turn metric_values into a hypertable with 10-minute chunks.
SELECT create_hypertable('metric_values', 'timestamp', chunk_time_interval => INTERVAL '10 minutes');
In short, this says to create a hypertable on metric_values using the timestamp column to split the chunks into ten minute blocks. We can verify that the hypertable was created by running the show_chunks command below.
SELECT show_chunks('metric_values');
Since we are not ingesting data into the hypertable yet, the chunk count will be zero. Now we have to create the user that will be allowed to read and write from these tables, and then, we can start working with the script and Grafana.
We will create the metrics_user using the PostgreSQL client where we just created the tables. Run each of the commands below one at a time. These are fairly self explanitory based on the query
CREATE USER metrics_user WITH PASSWORD 'notasecurepassword';
GRANT ALL PRIVILEGES ON DATABASE "metrics_ht" to metrics_user;
GRANT ALL ON field_names TO metrics_user;
GRANT ALL ON metric_values TO metrics_user;
GRANT USAGE, SELECT ON SEQUENCE field_names_field_id_seq TO metrics_user;
That last GRANT command GRANT USAGE, SELECT ON SEQUENCE field_names_field_id_seq TO metrics_user; took me a minute to figure out as I was used to older versions of MySQl which allowed a GRANT ALL to everything option. This is much more secure so I am glad I had to take the time to slow down and figure it out. Something to keep in mind.
And this ends all the work we have to do on the databases. The next steps are the fun part, the script, Grafana, and QUERIES. First, we will use the script to load the field_names data and start converting some of the data for us.
Now we are getting to the good stuff that all of this setup has led us to, the script and data conversion. I chose Perl 5 for this project for a few reasons. I have been scripting with Perl 5 for over 15 years. It has all the modules we need, with one noticeable limitation I have worked around, and I can prototype fairly fast with Perl 5. With the proper configuration and optimizations, I believe this script could handle processing metrics for thousands of devices on top of the single host we will be processing for this project. There was a time I wrote a Perl server using only SNMP querying to collect metrics that could handle nearly 10,000 cable modems every 15 minutes. That system did not have Redis as a buffer, and the script had to do all the work. Most of this comes down to ingest and querying optimizations in the pipeline, not so much what the script alone is doing.
With my rambling out of the way lets take a look at the script and break down what the major parts are doing. We will not define every single line of code. Only the parts that do something meaningful. The line numbers do match up to what you will find in the script.
17: use Redis;
18: use DBI;
19: use Data::Dumper;
20: use JSON::XS;
21: use Getopt::Long;
22: use Scalar::Util qw(looks_like_number);
These lines load all of the perl modules that are needed to read the data from Redis, convert it from JSON to SQL inserts and insert it into PostgreSQL.
28: my @isRunning = qx(ps auxwwww | grep $0 | grep -v grep | grep -v $$);
29: if ( scalar(@isRunning) > 0 ) {
30: print "Script is running. Exiting\n";
31: exit;
32: }
33:
These lines are a piece of code that I add to all scrits that run as a daemon. This prevents multiple copies of the script from running which allows us to setup a cron job to restart the script if it fails.
36: GetOptions(
37: 'help' => \my $help,
38: 'dbname=s' => \my $dbname,
39: 'dbhost=s' => \my $dbhost,
40: 'dbport=i' => \my $dbport,
41: 'dbuser=s' => \my $dbuser,
42: 'dbpass=s' => \my $dbpass,
43: 'batchsize=i' => \my $batchsize,
44: 'rkey=s' => \my $rediskey,
45: 'debug=i' => \my $debug,
46: 'fields_file=s' => \my $fields_file,
47: 'build_fields_table' => \my $build_fields_table,
48: 'live' => \my $live
49: ) or die "Invalid options passed to $0\n";
50:
51: $dbport = 5432 if !$dbport;
52: $rediskey = 'metricbeat' if !$rediskey;
53: $debug = 0 if !$debug;
54: $dbhost = 'localhost' if !$dbhost;
55:
56: if ( !$dbname || !$dbhost || !$dbuser || !$dbpass || $help ) {
57: print "Missing command line settings.\n";
58: printHelp();
59: exit;
60: }
61:
62: if ($debug) {
63: print "Using Values\n";
64: print "--dbname $dbname --dbhost $dbhost --dbport $dbport --dbuser $dbuser --dbpass $dbpass --rkey $rediskey\n";
65: }
These lines load our command line options and set some defaults for the known values for this project. Then it checks for missing values and runs the printHelp() sub routine if any required values are missing. At line 62 we pring the given values if the --debug 1 option was given.
68: my $redisClient = Redis->new;
71: my $dbh = DBI->connect( "dbi:Pg:dbname=$dbname;host=$dbhost;port=$dbport", $dbuser, $dbpass, { AutoCommit => 0, RaiseError => 1 } ) or die $DBI::errstr;
These lines load the Redis and PostgreSQL clients that this script will use to move the data from Redis to PostgreSQl after it has been converted from JSON to SQL.
73: if ($build_fields_table) {
74: buildFieldsTable();
75: exit;
76: }
If the --build_fields_table option was given then run the buildFeidlsTable(); sub routine. The code for that subroutine is below.
245: sub buildFieldsTable{
246: print "build_fields_table\n";
247: if ( !$fields_file ) {
248: print "Missing --fields_file\n";
249: printHelp();
250: exit;
251: }
252: my @fieldsLines = qx(cat $fields_file);
253: my $fieldsLines;
254: my $field_name;
255: my $insertQuery;
256: my $lineCounter = 0;
257: my $insertedCount = 0;
258: my $updatedCount = 0;
259: my @insertArray;
260:
261: foreach my $line (@fieldsLines) {
262: chomp($line);
263: if ( $line =~ /::/ ) {
264: chomp( $fieldsLines[ $lineCounter + 3 ] );
265: $fieldsLines[ $lineCounter + 3 ] =~ s/'//g;
266:
267: #Check and ingest field_name if it does not exist
268: if ( $insertQuery =~ "INSERT" ) {
269: print "Do The insert\n";
270: $insertQuery = "";
271: }
272:
273: #print "Split Field Name\n";
274: ( undef, $field_name, undef ) = split( /\`/, $line );
275:
276: #print $field_name."\n";
277: #check if this row already exists in the database
278: my $checkSql = "SELECT * from field_names where field_name = '" . $field_name . "'";
279: my $fieldNameCheck = $dbh->selectall_hashref( $checkSql, "field_name" );
280: $dbh->commit();
281:
282: if ( $fieldNameCheck->{$field_name}->{field_id} ) {
283: my $updateSql = "UPDATE field_names SET description = '" . $fieldsLines[ $lineCounter + 3 ] . "' WHERE field_name = '" . $field_name . "'";
284: $dbh->do($updateSql);
285: ++$updatedCount;
286: }
287: else {
288: my $insertQuery = "INSERT INTO field_names (field_name, description) VALUES('" . $field_name . "','" . $fieldsLines[ $lineCounter + 3 ] . "')";
289: $dbh->do($insertQuery);
290:
291: ++$insertedCount;
292: }
293: }
294:
295: #print "Field Name: ".$field_name." ".$line ."\n";
296: ++$lineCounter;
297: }
298:
299: $dbh->commit();
300: print "Finished processing fields_talbe. Inserted: " . $insertedCount . " Updated: " . $updatedCount . " \n";
301:
302: exit;
303: }
This subroutine parses the project file project-file-fields.asciidoc to find metric value names and their definitions. This file has the same content as the file /etc/metricbeat/fields.yml, which is installed with Metricbeat. I find the asciidoc version easier to process, it comes from the same package and can be found at the Metricbeat GitHub repo here fields.asciidoc. Next we will take a look at the do{foreach{foreach{foreach{}}}}while loop that keeps this running. This large chunk of code will be summarized after the code.
091: do {
092: START:
093: my $redisCounter = 0;
094: my $metric = $redisClient->lpop( $rediskey);
095:
096: if ( !$metric ) {
097:
098: #if no metrics sleep for 2 seconds and try again.
099: print "No metrics. Sleeping\n" if $debug;
100: sleep(1);
101: goto START;
102: }
103:
104: #Loop through the metric array. This is an array of JSON text
105: #remove new line characters from end of string
106: chomp($metric);
107:
108: #print $metric."\n";
109: #Convert JSON string into perl Hash object
110: my $metricHash = decode_json($metric);
111:
112: #Assign a few variables for needed infomration at the root of the Hash
113: my $timestamp = $metricHash->{'@timestamp'};
114: my $hostname = $metricHash->{host}->{name};
115: my $eventModule = $metricHash->{event}->{module};
116: my $metricSet = $metricHash->{metricset}->{name};
117: my $deviceName = $eventModule;
118:
119: #This creates the hash that contains the values that we want to ingest
120: #we create the $theseMetrics variable to hold the metrics hash so the
121: #hash keychains to not get long and confusing
122: my $theseMetrics = $metricHash->{$eventModule}->{$metricSet};
123: #If $theseMetrics is empty skip it. Maybe add logging in the future for things that are skipped
124: #for this project dropping it is fine.
125: print $timestamp. " ->" . $hostname . " -> " . $eventModule . " -> " . $metricSet . "\n" if $debug >= 2;
126: print $theseMetrics. "\n" if $debug >= 2;
127:
128: #At this point we have to start looping through the metrics to get
129: #the names and values of the metrics.
130: #These can be up to four layers deep depending on the type of
131: #metric
132: #first level loop
133: if ( $theseMetrics->{name} ) {
134: print "This is a Device Name: " . $theseMetrics->{name} . "\n" if $debug >= 2;
135: $deviceName = $theseMetrics->{name};
136: }
137: foreach my $level_1 ( keys %{$theseMetrics} ) {
138: #Create New Variable for this level
139: my $level_1Hash = $theseMetrics->{$level_1};
140: #Check to see if this is a key with a value or another hash to loop
141: if ( ref($level_1Hash) eq "HASH" ) {
142: #Here we will loop the variable that we created above which is actually $theseMetrics->{$level_1}
143: foreach my $level_2 ( keys %{$level_1Hash} ) {
144: #second Level Loop
145: my $level_2Hash = $level_1Hash->{$level_2};
146: if ( ref($level_2Hash) eq "HASH" ) {
147: #third level loop
148: foreach my $level_3 ( keys %{$level_2Hash} ) {
149: my $level_3Hash = $level_2Hash->{$level_3};
150: if ( ref($level_3Hash) eq "HASH" ) {
151: #fourth level loop
152: foreach my $level_4 ( keys %{$level_3Hash} ) {
153: my $level_4Hash = $level_3Hash->{$level_4};
154: if ( ref($level_4Hash) eq "HASH" ) {
155: #No hashes are above 4 deep in this data set
156: #This is a place holder for logic flow
157: }
158: else {
159: my $field_name = $eventModule . "." . $metricSet . "." . $level_1 . "." . $level_2 . "." . $level_3 . "." . $level_4;
160: my $fieldCheckReturn = checkFieldNameExsts($field_name);
161: my $field_id = $fieldNameRef->{$field_name}->{field_id};
162: push(@insertArray,[$timestamp, $field_id, $hostname, $deviceName, $level_4Hash, $field_name]);
163: ++$insertCounter;
164: checkBatchSize(@insertArray);
165: }
166: }
167: }
168: else {
169: my $field_name = $eventModule . "." . $metricSet . "." . $level_1 . "." . $level_2 . "." . $level_3;
170: my $fieldCheckReturn = checkFieldNameExsts($field_name);
171: my $field_id = $fieldNameRef->{$field_name}->{field_id};
172: push(@insertArray, [$timestamp, $field_id, $hostname, $deviceName, $level_3Hash, $field_name]);
173: ++$insertCounter;
174: checkBatchSize(@insertArray);
175: }
176: }
177: }
178: else {
179: my $field_name = $eventModule . "." . $metricSet . "." . $level_1 . "." . $level_2;
180: my $fieldCheckReturn = checkFieldNameExsts($field_name);
181: my $field_id = $fieldNameRef->{$field_name}->{field_id};
182: push(@insertArray,[$timestamp, $field_id, $hostname, $deviceName, $level_2Hash, $field_name]);
183: ++$insertCounter;
184: checkBatchSize(@insertArray);
185: }
186: } #end $level_1Hash Loop
187: }
188: else {
189: #This is jsut a place holder for logic flow
190: #We do no inserts here because this can only contain the
191: #name string which we assigned at the top of
192: #this loop
193: }
194: }# end $theseMetricsHash Loop
195:
196: } while 1;
197:
The do{}while part of this loop is what keeps it running over and over looking for new metrics. Since this all works together we will use a list to step through the code. The list entris will be marked with the identifying line numbers.
199: sub checkBatchSize{
200: my @thisArray = @_;
201: if(scalar(@thisArray >= $batchsize)){
202: #print scalar(@thisArray)."\n";
203: my $insertReturn = processInsertArray(@thisArray);
204: }
205: }
207: sub processInsertArray{
208: my @thisArray = @_;
209: my $insertMetric = "INSERT INTO metric_values (timestamp,field_id,hostname,device,value) VALUES(?,?,?,?,?)";
210: my $sth = $dbh->prepare($insertMetric);
211: foreach my $metric(@thisArray){
212: #In this version we are going to skip text and empty values. Maybe in the future log these and or add support
213: #print Dumper($metric);
214:
215: if(looks_like_number($$metric[4])){
216: $sth->execute($$metric[0],$$metric[1],$$metric[2],$$metric[3],$$metric[4]);
217: }
218: }
219: $dbh->commit();
220:
221: @insertArray = ();
222:
223: }
224:
With that out of the way, there is one last processing subroutine that we will take a quick peek at then we can get to actually running the script. This routine is a sometimes do something routine that takes care of field names that were not imported from the file project-file-fields.asciidoc. Some field names are set up as wildcards in the asciidoc file. The field_names table builder does not know how to deal with these, so the script below detects that condition and creates a new field_name, and new field_id entry in the field_names table, then reloads the field_name hash into the global variable.
228: sub checkFieldNameExsts{
229: my $field_name = shift;
230: #Some field_names are wildcarded in the fields.asciidoc. We will of course have no match for these.
231: #Here we will verify that the field name exists, create it if it does not and return the created value
232: if(!$fieldNameRef->{$field_name}){
233: print "Field Name Does Not Exist. Creating: ".$field_name."\n";
234: my $insertQuery = "INSERT INTO field_names (field_name) VALUES('" . $field_name . "')";
235: my $newFieldReturn = $dbh->do($insertQuery);
236: my $fieldNamesSql = "SELECT * from field_names";
237: print "Reload the Hash: ".$fieldNamesSql. "\n" if $debug;
238: $fieldNameRef = $dbh->selectall_hashref( $fieldNamesSql, "field_name" );
239:
240: $dbh->commit();
241:
242: }
243: }
And that is it. This is how the script works to convert the JSON formatted metrics provided by Metricbeat into SQL inserts. Lots of reading and information ingesting, but it is all worth it now as we run the script for the first time.
Now we are all finished with installs, explanations, configurations, and explaining how all of these backend parts work together. Now we can run the script for the first time to do its first job of building the field_names table. If you run the script without any options, it will print the detailed help file to the screen. It is good to know these options, but with this project, many items are set to defaults that will work with this system. The only thing that we have to provide to run the script is the database name, in this case, metrics_ht, and the database username and password that we created earlier. If you followed along with all of the setup to this point, you should be able to run the command below with no problems.
perl hyprid-poller.pl --dbname metrics_ht --dbuser metrics_user --dbpass notasecurepassword --build_fields_table --fields_file project-file-fields.asciidoc
This first run of the script builds the field_names table that is required to match the field_id from the metric_values table to the field_id in this table, allowing us to use metric names instead of the numeric id. If there are no errors, this script will only print build_field_table and a count of how many fields were inserted or updated into the field_names table. And that is the final step. Now we start collecting metrics. All we have to do to change the script to that mode is remove the --build_fields_table and --fields_file options, as shown below.
perl hyprid-poller.pl --dbname metrics_ht --dbuser metrics_user --dbpass notasecurepassword
If there are no errors, then we are collecting data. We can easily verify this by checking our record count in the PostgreSQL client. If you do not have a terminal connected to the database, open a new terminal and run the command below. We will leave this window running because this is where the script is running. We can move the script into background mode by running perl hybrid-poller.pl --dbname metrics_ht --dbuser metrics_user --dbpass notasecurepassword &, which will detach the process from the console but for this project, it is good to leave it running in the terminal for now. This time we will connect with the user we created earlier, using the same username and password we are using when we launch the script. You will be prompted for the password.
psql -h localhost -U metrics_user -d metrics_ht
This will take us directly to our metrics_ht database. Now we can count the rows in the values table to see how many records we have collected. Run the SQL command below to count the records.
SELECT COUNT(*) FROM metric_values;
This command will return an integer named count that shows how many metrics have been inserted into the metric_values hypertable. Now that we have some data in the tables, we can look at how many chunks have been created by our hypertable with a 10-minute interval. Run the command below to see the chunk information.
SELECT show_chunks('metric_values');
This query will display an output similar to the one below. This output shows how many different files this hypertable has been chunked into. Since we set these chunks to a 10-minute interval, the count of chunks should match the amount of time since we started Metricbeat. If you have been working through this for an hour, then you should see six chunks; for 30 minutes, you would see three chunks, and so on. This count may not be exact as it depends on where our script is in processing the metrics from the Redis buffer. Remember, 10-minute intervals on the chunks is probably never a good idea, but in the case of this project, it helps explain how these chunks work. Also, if, for some reason, you have to backfill data, it goes to the chunk where that timeframe exists.
metrics_ht=> SELECT show_chunks('metric_values');
show_chunks
-----------------------------------------
_timescaledb_internal._hyper_1_1_chunk
_timescaledb_internal._hyper_1_2_chunk
(2 rows)
Here on line 1175 of this HTML document, we come to the end of all the steps needed to prepare this data for use by Grafana. From here on out, we will be working with Grafana. We will set up the data source and step through a few custom queries needed to use this data set with Grafana, even the coveted, on-the-fly events per second calculations. Now the fun begins, visuals and dashboards.
By default, Grafana does not understand the layout of the relational database that we have created. This database is not a standard key/value setup. I call this a key/key/key/value setup. For this to work, we have to know the hostname, system, and field_name (which is field_id before the relation), so we can build visuals based on field names instead of having to search the field_id every time we want to create a new visual. But that is just for queries for the data type of gauge. If we are working with counter-type data, data that is always increasing or decreasing in the same direction, then we have to use a LAG query which gets a little more complex. We will cover both ways as we move forward.
As we have not logged into Grafana yet, we will have to go through the steps of logging in as admin/admin and creating a new password. Let's navigate back to the web interface that we opened many lines ago, or open a new one at http://project-server-ip:3000 and log in with the username admin with the password admin. When prompted, choose a new password and continue.
The first thing that we need to do is add a new data source. Once you are logged in, navigate to the Data sources option, as shown in the screenshot below.
Once you are at the new window, click the Add data source button. A new window will load with a screen full of options for data sources. Either type Postgres into the filter box at the top of this page or scroll through the list until you find PostgreSQL and select that option.
After clicking on PostgreSQL, you will see a screen like the one below. The values in the image below are the values for this project. SSL is disabled in this project as this is all self-contained on one VM. If this idea were to move into production somewhere, all of the SSL and other security precautions should be taken. Everything on this project is contained in this single localhost, so it is fairly secure.
The Host setting is the localhost connection to the database. The User and Password are the same ones we created above with the CREATE USER command. If you changed either the username or the password, that should be reflected here. And finally, remember that we are connecting to the metrics_ht database. Once all these settings are filled in and verified, click the Save and Test button at the bottom of this screen. If everything is good, you will see a green popup above the button that says "Database Connection OK". Now we can build a visual.
From the page you are at now click on Dashboards icon as shown in the image below and select Manage.
Once the new page has loaded select the top left option "Add a new panel". You should see the screen below.
As shown in the image above, click the pencil edit icon, which is circled in red, and paste the query below into the query box. Once you have done all of this, click on the visualization window, and you should see a graph similar to the one in the image.
SELECT
"timestamp" AS "time",
field_names.field_name AS metric,
hostname AS hostname,
avg(value) AS "Load 1m"
FROM metric_values LEFT JOIN field_names on field_names.field_id = metric_values.field_id
WHERE field_names.field_name = 'system.load.norm.1' AND $__timeFilter(timestamp)
GROUP BY "timestamp",2,3
ORDER BY 1,2
The graph and query above show the 1-minute processor load of the project server. We chose that metric to graph with this line of the query WHERE field_names.field_name = 'system.load.norm.1'. system.load.norm.1 is the metric we are querying from the database.
For an overall view of the system load, we can add more queries to see the 5-minute and 15-minute cpu load on this server. We do this by clicking the + Query button under our current query editor. When the new query editor box loads, click on the edit pencil as we did before and paste in the same query, though we will change a few things. Below is the query for the 5-minute load average.
SELECT
"timestamp" AS "time",
field_names.field_name AS metric,
hostname AS hostname,
avg(value) AS "Load 5m"
FROM metric_values LEFT JOIN field_names on field_names.field_id = metric_values.field_id
WHERE field_names.field_name = 'system.load.norm.5' AND $__timeFilter(timestamp)
GROUP BY "timestamp",2,3
ORDER BY 1,2
If you compare the two queries, you will see that lines 5 and 7 differ. Line 5 changed Load 1m to Load 5m, and line 6 changed system.load.norm.1 to system.load.norm.5. The change on line 5 changed the graph legend to Load 5m. The change on line 7 tells the query to look for system.load.norm.5 instead of system.load.norm.1. Before we look at another screenshot, let's go ahead and add a third query to get the 15-minute cpu load reported by this project server.
SELECT
"timestamp" AS "time",
field_names.field_name AS metric,
hostname AS hostname,
avg(value) AS "Load 15m"
FROM metric_values LEFT JOIN field_names on field_names.field_id = metric_values.field_id
WHERE field_names.field_name = 'system.load.norm.15' AND $__timeFilter(timestamp)
GROUP BY "timestamp",2,3
ORDER BY 1,2
Click on the graph window once more and we should now have a third metric named Load 15m as shown in the image below.
This updated visual shows this project server's 1-minute, 5-minute, and 15-minute load. These are always good values to know on any system because these metrics easily let us know if our server has a problem or is lacking CPU power. We can save this and move on to the time series events per-second query. Choose a name for this visual and put it in Panel title on the top right of the screen. I named this one Processor Load. Then click Apply. After clicking apply, the browser should return to the dashboard with this graph present on that dashboard. In almost all cases, there will be a bit of a spike in this graph that lets us know when we first started reading data out of Redis and ingesting it into PostgreSQL. This spike happens because Redis has been buffering our metrics since we started the script. There were many records to process at first. After that initial processing, the ingest rate falls to a few every 10 to 20 seconds. This data can be better visualized if we use the timestamp dropdown to show the last 15 minutes, as we are currently looking at the last 6 hours.
We will keep this visual around for a while, so give it a title using the box on the right outlined in blue. This one I called System CPU Load. Then click Apply. This will return you to the new dashboard with our system load visual as the only thing on the dashboard.
We have accomplished a lot in these few pages. We have installed two types of databases, TimescaleDB extensions, Perl modules, metricbeat, and Grafana. Not to mention that we have created databases, database tables, database users, and a hypertable. Now we are running a script that converts nested JSON data into SQL inserts to store metrics. And now, we are visualizing that data using Grafana.
We are taking a breather here because next, we get into a query that will let us build visuals around data based on counters. Unlike gauge data, counter data is, in most cases, always increasing. Bandwidth counters, disk usage counters, and such generally go up or down in the same direction for long periods. If you try to visualize these kinds of values without LAG queries, you will see an ever-increasing line or sawtooth dips when counters reset or, in the case of disk usage, when data is removed. To visualize the data correctly, we have to perform math between the current database row and the row before the current row. This type of query gives us the difference in the value over that time range which we then render on the visualization.
At this point, we have built a fully functioning visualization system out of readily available parts and tied them together with a bit of Perl. This is a grand accomplishment in an environment where many systems like this are single-tasked and split across many people on a team. I think it is good for everyone to know how all of the parts work, which is what spawned this document. Be proud of what we have accomplished to this point, and we now take on the PostgreSQL LAG query.
Let's head back to the dashboard where our System CPU Load graph is displayed. Click the add panel icon at the top of this screen, and choose Add New Panel. This is the same screen we used for the last visualization; we will just be using a different query.
Click on the edit button as before and paste in the query below.
WITH var1 AS(SELECT timestamp, hostname,metric_values.field_id, field_name, device, value
FROM metric_values
LEFT JOIN field_names on field_names.field_id = metric_values.field_id
WHERE field_name = 'system.network.in.bytes'
AND device != 'lo'
AND $__timeFilter(timestamp)
),
var2 AS(SELECT timestamp, hostname, device, value,
LAG(value,1) OVER (ORDER BY hostname,device,timestamp) previous_value,
LAG(timestamp,1) OVER (ORDER BY hostname,device,timestamp) previous_timestamp,
LAG(device,1) OVER (ORDER BY hostname,device,timestamp) previous_device
FROM var1
),
var3 AS(SELECT timestamp, hostname, device, value, previous_value,
CASE
WHEN device = previous_device AND EXTRACT (EPOCH FROM(timestamp - previous_timestamp)) > 0
THEN (value - previous_value) / EXTRACT (EPOCH FROM(timestamp - previous_timestamp))
ELSE NULL
END AS bytes_sec
FROM var2
)
SELECT timestamp AS "time", hostname, device, bytes_sec AS "Inbound" from var3 ORDER BY time
If everything is still working smoothly, this should generate a new matric visualization showing the Inbound network traffic on all interfaces. This visualization is generated with the slightly more complex LAG query that we ran for this visual. The whole query is broken down in detail in the next section.
The query above does everything needed to calculate the bytes per second value for every host that has metrics for the field_name system.network.in.bytes. Like many other metrics, this one is a counter. The value is ever-increasing. If you were to visualize this metric's time series raw values, you would see an ever-growing angle as the counter increases. This type of visual is not very useful information, maybe in a table or a single metric readout, but not on a time series line graph, which is what most of us want.
Since we store all of the metricbeat metric values in a table that is keyed by hostname, device, and field_id, it takes a custom query like this one for Grafana to render the data correctly. There may be ways to do this in Grafana, but I found it easier to work with the queries, especially since the PostgreSQL data source is not like most key/value sources.
Without including the field_names table, we can see that we are working with timestamp, hostname, device, field_id, and value. The three keys that allow the metric to be identified are hostname, device, and field_id. The other two keys, timestamp and value, are our actual metrics as they are the two ever-changing values. In the case of counters, this ever-changing value is always incrementing. The timestamp is also incrementing with every new metric entry. To determine the number of events per second, we have to find the difference between current and previous values, find the difference between current and previous timestamps and then divide the value difference by the time difference. That is what this query does. Below we will break the query parts down and explain what they are doing.
1 WITH var1 AS(SELECT timestamp, hostname,metric_values.field_id, field_name, device, value
2 FROM metric_values
3 LEFT JOIN field_names on field_names.field_id = metric_values.field_id
4 WHERE field_name = 'system.network.in.bytes'
5 AND $__timeFilter(timestamp)
6 ),
Lines 1-6 Explained
So, in this case we are looking for only the field_name system.network.in.bytes and only within the time range that is selected on the Grafana dashboard. $__timeFilter(timestamp) can be changed to a normal query such as AND timestamp > now() - INTERVAL '15 minutes' to work as a standard query. The results of this query are then passed on to the next section as var1.
7 var2 AS(SELECT timestamp, hostname, device, value,
8 LAG(value,1) OVER (ORDER BY hostname,device,timestamp) previous_value,
9 LAG(timestamp,1) OVER (ORDER BY hostname,device,timestamp) previous_timestamp,
10 LAG(device,1) OVER (ORDER BY hostname,device,timestamp) previous_device
11 FROM var1
12 ),
Lines 7-12 Explained
13 var3 AS(SELECT timestamp, hostname, device, value, previous_value,
14 CASE
15 WHEN device = previous_device AND EXTRACT (EPOCH FROM(timestamp - previous_timestamp)) > 0
16 THEN (value - previous_value) / EXTRACT (EPOCH FROM(timestamp - previous_timestamp))
17 ELSE NULL
18 END AS bytes_sec
19 FROM var2
20 )
Lines 13-20 Explained
21 SELECT timestamp AS "time", hostname, device, bytes_sec AS "Inbound" from var3 ORDER BY time
Line 21 Explained
Below is an example of the output sent to Grafana with this query.
time | hostname | device | Inbound
-------------------------+----------------------+-----------------+------------------------
2022-10-08 12:17:33.628 | haproxy-dev | ens18 | 1508.4491550844915508
2022-10-08 12:17:35.999 | dev-timescale | ens18 | 28008.100000000000
2022-10-08 12:17:36.221 | beat-interceptor | ens18 | 18365.536553655366
2022-10-08 12:17:38.132 | dangerousmetrics.org | eth0 | 50.2000000000000000
2022-10-08 12:17:38.573 | dev-dangrousmetrics | ens18 | 772.6000000000000000
2022-10-08 12:17:43.629 | haproxy-dev | ens18 | 1833.7166283371662834
2022-10-08 12:17:45.999 | dev-timescale | ens18 | 28669.500000000000
2022-10-08 12:17:46.222 | beat-interceptor | ens18 | 18400.759924007599
2022-10-08 12:17:48.132 | dangerousmetrics.org | eth0 | 62.0000000000000000
2022-10-08 12:17:48.573 | dev-dangrousmetrics | ens18 | 521.9000000000000000
2022-10-08 12:17:53.628 | haproxy-dev | ens18 | 1528.2528252825282528
2022-10-08 12:17:55.998 | dev-timescale | ens18 | 74631.463146314631
2022-10-08 12:17:56.222 | beat-interceptor | ens18 | 57966.900000000000
2022-10-08 12:17:58.133 | dangerousmetrics.org | eth0 | 60.9939006099390061
2022-10-08 12:17:58.574 | dev-dangrousmetrics | ens18 | 778.9221077892210779
Remember that we touched on the device column earlier. It can identify many different things, but this is taken into account when we select the metric type to see, so as long as we are familiar with the data, this is a good way to track what sent the metric in the shortest form possible. In a future version, the hostname and device will be integers linking to metric_hostnames and metric_devices. This design will further reduce the size of the metric_values table just as using the field_names table did. Relating two fields on an integer is vastly more efficient than linking on any text, especially what could be long text, like the FQDN of a host that is sending metrics.
With all of the explanations about the query out of the way, let's go ahead and create the Outbound time series for this visual. We will add a second query and change two things, just like when we were working on the load graph. We will try this one without reprinting the query. The changes are simple and outlined below. First, click the + Query button and paste our current query into the new query window by clicking the edit pencil, as we have done a few times already. Once the query is in the box, make the changes below.
Once these changes are made, a new set of lines and new legend items should appear on the visual showing Outbound traffic along with the Inbound traffic. The x-axis shows raw values. If you would like to see this as a converted value, such as bytes per second, scroll through the Grafana menu on the right until you find Standard options. Choose bytes/s from the Unit selector, Data Rate, and then bytes/s. Whether you choose IEC units or SI units is up to you. This update should change the x-axis to a format that is easier to understand when working with bandwidth. Give this panel a name like NEtwork Traffic and click Apply at the top right. Now we have Network Traffic and Processor Load visualizations.
And there is our dashboard with two running graphs. But you may ask, how do I know what data I have to work with here? Lets build a query and find out.
It is pointless to have a ton of metrics to work with but have no idea what they are. We can build a fairly simple query that will sample the data and connect it to the field_names with their definitions. With a bit of WHERE filtering, we can zoom in on what metrics we want to work with. We will build this as a Grafana table and display it on the same dashboard as our other graphs.
Let's head back to the dashboard that shows the two visuals we created and click Add panel. Choose Add a new panel, and we will be back at our favorite part of Grafnana, designing a new metric visual. This visual will be different. In the upper right-hand dropdown menu that now says Time series, click that dropdown and choose Table. Then at the bottom of our query designer change the format to Table. Once these changes are made, click our favorite old editor pencil and paste in the query below.
SELECT field_name,description, field_names.field_id, value, device, hostname FROM field_names JOIN (
SELECT DISTINCT ON (field_id) *
FROM metric_values
WHERE timestamp > now() - INTERVAL '1 minute'
ORDER BY field_id
) as singleID
ON field_names.field_id = singleID.field_id
When finished you should see what is in the screenshot below.
This query provides a list of all of the metric definitions available that we have metrics for. This query initially selects all rows in the field_names table, then JOINS on a sub-query that selects one sample for every field_id found in the field_names table. If no sample is found, the row is not returned. Let's give this a name, something like Available Metrics, and then click Apply.
When selecting the correct query for the visualization you want to build, it takes a bit of data intuition to know a gauge data type from a counter data type. If you can not tell from the definition, you can write a query to watch that data point to see if it is ever-increasing or ever-changing. Ever increasing in one direction is generally a counter. A value that goes up and down over time is generally a gauge. You can browse through all these metrics and make many different visualizations. Play around with the data and see what you can come up with.
Remember what we have built when you find metrics you want to work with. Gauge data will use the first query we used earlier for visualizing processor load, which is an ever-changing value. We used the slightly longer PostgreSQL LAG query to visualize data where we needed to calculate events per second from a counter. It does not matter what you are visualizing. From bushels of corn to quantum particles, we are always looking for a value that is changing, be it up, down, left, right, or random. In the end, use queries instead of middleware wherever possible. Make the database do the work.
We are now at the end of this guide. If you followed along through the guide, you should have a self-contained system that can convert Metricbeat JSON data sets into SQL inserts using Redis as a JSON buffer and Grafana as the display engine. We threw in a few multi-level PostgreSQL queries and some deep explanations for this project. I hope anyone following this guide learned a little about time series metrics and how to process them with SQL.
This project is what I like to call a side quest. This project spawned because I needed a way to collect a nice set of metrics for a larger project, "Visualizing Bureau of Labor Statistics Employment Data with PostgreSQL and TimescaleDB". Stay tuned for updates on that project and a document similar in detail to this one that explains the parsing and storing of that data. That project will also come with a bit of JavaScript and NodeJS to handle rendering that data.
The TimescaleDB table chunking we used here is only a small part of its functionality. I will dig more deeply into the available hyper_functions as I process the BLS and CDC data in future projects. Stay tuned for those.