MySQL Can Do it Too


So Can Any Other Datastore.


Many posts around this site that have to do with manipulating and analyzing data use Elasticsearch. Personally, I love many things about Elasticsearch and the data research that I can do with it. But is it always the best answer? Do you always need the search engine type querying that Elasticsearch offers? Granted, Elasticsearch was ahead of its time when it came out. On top of the search engine type querying and high-speed results, it offered solid clustering and horizontal expansion where many other database systems at the time were struggling with clustering reliability. But times change, things improve, and the world moves on.

I started working with the Elastic system around version 2.4 and never looked back. Before finding Elasticsearch, I did almost all of my data wrangling using MySQL. But my knowledge is not limited by just those two. For other projects, I have used Redis, Mongo, Oracle, and whatever else that looked like it may fit whatever project I am working on.

The tricks that your database can perform are completely based on database design and how you write the software that uses that database. With properly designed database schemas, data linkages, and an optimized query client you are only limited by your imagination. Now that MySQL can be reliably clustered and database servers spanned with federated tables, it looks more appealing. Remember, I started to shift away from MySQL in 2016. A lot has changed since then. But again, any datastore system, even flat text files, can perform almost any task with the proper schema and optimized data access methods.

For this experiment, we will only collect and store system data. Similar to the data that comes from the metricbeat system.yml module. We will work with two primary scripts, the data poller, and the REST interface, for querying the data. To keep this as simple as possible, it will not be modular. It will only do one thing, pull the same system data that metricbeat would with the system.yml module. That data will then be stored in a MySQL database.table to be queried by the REST interface when needed. First, we must create a database schema.

There are two ways to go about creating the table schema for this project. The first option is to create a table that knows of a key for every single piece of data that we will send from the poller. With this option, you have to install a new database schema every time you want to add a new data type to the set. The second option is to build a more simple schema containing only the fields "hostname, timestamp, title, value". This creates a simple table that can accept new data without rewriting the schema because this is handled by the title and value fields. The downside to this route is that you have to be careful of what data type you set for the value field, as performance is better if the field type matches. You could set it to varchar 255 and store anything you want in that field, having the client handle calculations. But, if we set it properly, and knowing that we will be working with all integers and floats; why not set the field properly and let the database take care of calculations when possible?

So, we will take the second option and set the field type for decimal. This should cover any value that we need to store. We will store the timestamp in epoch format. Javascript uses a millisecond epoch by default, so we will store it in a bigint since it is larger than standard int allows. Below is the simple schema for this project.

CREATE TABLE sqlbeat.systemdata ( 
   epochts              bigint UNSIGNED NOT NULL    ,
   hostname             varchar(100)  NOT NULL    ,
   title                varchar(25)  NOT NULL    ,
   value                decimal(19,6)  NOT NULL    ,
   iface                varchar(20)      ,
   fspath               varchar(40)      ,
   operstate            varchar(14)      
 ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci COMMENT='This table holds all of the data polled from the system module.';

ALTER TABLE sqlbeat.systemdata COMMENT 'This table holds all of the data polled from the system module.';

ALTER TABLE sqlbeat.systemdata MODIFY epochts bigint UNSIGNED NOT NULL   COMMENT 'Timestamp of this poll data in epoch milliseconds.';

ALTER TABLE sqlbeat.systemdata MODIFY hostname varchar(100)  NOT NULL   COMMENT 'Hostname of server that data was polled form.';

ALTER TABLE sqlbeat.systemdata MODIFY title varchar(25)  NOT NULL   COMMENT 'Name of data that was collected such as Load1, Load5, DiskIO and such.';

ALTER TABLE sqlbeat.systemdata MODIFY iface varchar(20)     COMMENT 'Name field for network Interface data';

ALTER TABLE sqlbeat.systemdata MODIFY fspath varchar(40)     COMMENT 'Only used for data file system metrics so as to identify the drive mount.';

ALTER TABLE sqlbeat.systemdata MODIFY operstate varchar(14)     COMMENT 'Operation state of network interface.';

               

That was the easy part. Now we get to move into how to collect and store the data. Open your Javascript mind and lets get this ball rolling.


The Code

As far as this project, Javascript does us a massive favor with the systeminformation package available from npm. There will be no external calls to gather information, just scheduled polls to the systeminformation module and then prepare that data or ingestion into MySQL. Once the data is in MySQL, the only limit to what we can do with it is pretty much imagination.

As with other projects on this site this code will be heavily documented and written in a top down fashion so it is easier to follow the flow. A bunch of async, try, catch, then, chained together can get confusing, this is just to explain the logical flow.

                  
var mysql = require('mysql2');
const si = require('systeminformation');


//Connect to the database
var connection = mysql.createConnection({
  host     : 'localhost',
  user     : 'sqluser',
  password : 'userpass',
  database : 'sqlbeat'
});

connection.connect();


var pollTimer = setInterval(pollTheData,2000);

async function pollTheData(){
   try{

      //Stop the Poll timer so we do not have overlapping polls running
      clearInterval(pollTimer);
      //first we will get the hostname
      var osinfo = await si.osInfo();
      var hostname = osinfo.hostname;
      //Then lets query systeminformation for the load information.
      var loadInformation = await si.currentLoad();
      //This object is a set of keys and one array that holds information per cpu core.
      //We are only going to store the average, not the per cpu core.
      //Also, we could loop through they keys and create the database inserts dynamically
      //but, for this project we are going to so this part manually so you can see
      //each value and title.

      //get Timestamp
      var timeNow = Date.now();
      

      var avgLoad = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','avgLoad','"+loadInformation.avgLoad+"')";
      console.log(avgLoad);
      connection.query(avgLoad, function (err, result) {
         if (err) throw err;
      });


      var currentLoad = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','currentLoad','"+loadInformation.currentLoad+"')";
      console.log(currentLoad);
      connection.query(currentLoad, function (err, result) {
         if (err) throw err;
      });


      var currentLoadUser = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','currentLoadUser','"+loadInformation.currentLoadUser+"')";
      console.log(currentLoadUser);
      connection.query(currentLoadUser, function (err, result) {
         if (err) throw err;
      });


      var currentLoadSystem = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','currentLoadSystem','"+loadInformation.currentLoadSystem+"')";
      console.log(currentLoadSystem);
      connection.query(currentLoadSystem, function (err, result) {
         if (err) throw err;
      });

      var currentLoadNice = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','currentLoadNice','"+loadInformation.currentLoadNice+"')";
      console.log(currentLoadNice);
      connection.query(currentLoadNice, function (err, result) {
         if (err) throw err;
      });

       var currentLoadIdle = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','currentLoadIdle','"+loadInformation.currentLoadIdle+"')";
      console.log(currentLoadIdle);
      connection.query(currentLoadIdle, function (err, result) {
         if (err) throw err;
      });


      //Then lets get the disk io information
      var diskIO = await si.disksIO();
      var rIO = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','rIO','"+diskIO.rIO+"')";
      console.log(rIO);
      connection.query(rIO, function (err, result) {
         if (err) throw err;
      });

      var wIO = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','wIO','"+diskIO.wIO+"')";
      console.log(wIO);
      connection.query(wIO, function (err, result) {
         if (err) throw err;
      });
      var rWaitTime = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','rWaitTime','"+diskIO.rWaitTime+"')";
      console.log(rWaitTime);
      connection.query(rWaitTime, function (err, result) {
         if (err) throw err;
      });
      var wWaitTime = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','wWaitTime','"+diskIO.wWaitTime+"')";
      console.log(wWaitTime);
      connection.query(wWaitTime, function (err, result) {
         if (err) throw err;
      });
      var tWaitTime = "INSERT INTO systemdata (epochts, hostname, title,value) VALUES ('"+timeNow+"','"+hostname+"','tWaitTime','"+diskIO.tWaitTime+"')";
      console.log(tWaitTime);
      connection.query(tWaitTime, function (err, result) {
         if (err) throw err;
      });

      //Then lets get the file system sizes
      var fsSize = await si.fsSize();
      //This one is a little different. The only way to access this is through an array of mounts.
      //So we have to loop through the array to get the stats for every mount path.
      
      //Create array object for batch insert
      var insertArray = [];
      for (let i = 0; i < fsSize.length; i++) {
         console.log(fsSize[i]);
         //for ease of following we will assign the array element to its own variable on every loop.
         const thisMount = fsSize[i];

         //For eample purposes, these we will insert in one batch.
         //Everything above and below could be done this way, and it is more efficient.
         //Since this is a "show me" script we will do it both ways.
         //to batch insert with nodeJS the best way to do it is with an array of arrays.
         //we created the insertArray above, now we create an array of data to push into the insert array
         //To do this the right way, we now need to loop through the objects from the array entry.
         //There is no other way to do this as we ned access to the key name to populate the title field.
         for (const [key, value] of Object.entries(thisMount)) {
            console.log(`${key}: ${value}`);
            //we do not care about the fs, type or mount keys in this scope.
            //thisMount.fs is used as the fspath, not a tracked value
            if(key !== "fs" && key !== "type" && key !== "mount"){
               var thisPathData = [timeNow, hostname,key,value,thisMount.fs];
               insertArray.push(thisPathData);
            }
         }
         console.log(insertArray);
         //And here, we push this whole array to Mysql.
         var fsInfoQuery = `INSERT INTO systemdata (epochts,hostname,title,value,fspath) VALUES ?`;
         connection.query(fsInfoQuery, [insertArray]);
      }
      //We are done here, Clear the insetArry;
      insertArray = [];



      //Finally lets check some network stats
      var networkStats = await si.networkStats();
      console.log(networkStats);
      //network stats are similar to the mount paths we worked with above
      //networkStats is an array of objects for each network interface.
      //so we must loop through it like we did with the mount paths above.
      //Create the Interface Array
      var interfaceInsertArray = [];
      for (let i = 0; i < networkStats.length; i++) {
         var thisInterface = networkStats[i];
         for (const [key, value] of Object.entries(thisInterface)) {
            //we have to check the value type since, depending ont he system, some values can be null
            if(key !== 'iface' && key !== 'operstate' && typeof value === "number"){
               console.log(`${key}: ${value}`);
               var thisInterfaceQuery = [timeNow, hostname, key, value, thisInterface.iface, thisInterface.operstate];
               console.log(thisInterfaceQuery);
               interfaceInsertArray.push(thisInterfaceQuery);
         }

      }
   }
   //And finally we do the network data insert
    var networkInfoQuery = `INSERT INTO systemdata (epochts,hostname,title,value,iface, operstate) VALUES ?`;
   connection.query(networkInfoQuery, [interfaceInsertArray]);
   //We are done with the array, clear it
   interfaceInsertArray = [];
   //Polling is done. Restart the timer
   pollTimer = setInterval(pollTheData,2000);
      return;
   }catch(error){
      console.log(error);
   }
}
                  
               

There are many ways to query this data. To simulate what an Elasticsearch aggregate query would return check out the query and its associated output below.

mysql> SELECT hostname,iface,title, MAX(value) From systemdata GROUP BY hostname,iface,title;
+--------------+--------+-------------------+---------------------+
| hostname     | iface  | title             | MAX(value)          |
+--------------+--------+-------------------+---------------------+
| twitterfeed  | NULL   | avgLoad           |            0.120000 |
| twitterfeed  | NULL   | currentLoad       |            5.128205 |
| twitterfeed  | NULL   | currentLoadUser   |            4.395604 |
| twitterfeed  | NULL   | currentLoadSystem |            2.853598 |
| twitterfeed  | NULL   | currentLoadNice   |            0.001021 |
| twitterfeed  | NULL   | currentLoadIdle   |           99.864713 |
| twitterfeed  | NULL   | rIO               |       111838.000000 |
| twitterfeed  | NULL   | wIO               |     11078648.000000 |
| twitterfeed  | NULL   | rWaitTime         |       274434.000000 |
| twitterfeed  | NULL   | wWaitTime         |     16372525.000000 |
| twitterfeed  | NULL   | tWaitTime         |      2052024.000000 |
| twitterfeed  | NULL   | size              |  52572946432.000000 |
| twitterfeed  | NULL   | used              |  19127160832.000000 |
| twitterfeed  | NULL   | available         |  30773379072.000000 |
| twitterfeed  | NULL   | use               |           38.350000 |
| twitterfeed  | ens160 | rx_bytes          |  31384849968.000000 |
| twitterfeed  | ens160 | rx_dropped        |      1934591.000000 |
| twitterfeed  | ens160 | rx_errors         |            0.000000 |
| twitterfeed  | ens160 | tx_bytes          |  24353820333.000000 |
| twitterfeed  | ens160 | tx_dropped        |            0.000000 |
| twitterfeed  | ens160 | tx_errors         |            0.000000 |
| twitterfeed  | ens160 | ms                |         2071.000000 |
| twitterfeed  | ens160 | rx_sec            |        74332.198444 |
| twitterfeed  | ens160 | tx_sec            |      1921781.265326 |
| new-es-lab-1 | NULL   | avgLoad           |            0.200000 |
| new-es-lab-1 | NULL   | currentLoad       |           10.882709 |
| new-es-lab-1 | NULL   | currentLoadUser   |           10.036276 |
| new-es-lab-1 | NULL   | currentLoadSystem |            1.219512 |
| new-es-lab-1 | NULL   | currentLoadNice   |            0.001310 |
| new-es-lab-1 | NULL   | currentLoadIdle   |           99.384994 |
| new-es-lab-1 | NULL   | rIO               |       335696.000000 |
| new-es-lab-1 | NULL   | wIO               |     29278686.000000 |
| new-es-lab-1 | NULL   | rWaitTime         |       898894.000000 |
| new-es-lab-1 | NULL   | wWaitTime         |     35658493.000000 |
| new-es-lab-1 | NULL   | tWaitTime         |      8368028.000000 |
| new-es-lab-1 | NULL   | size              | 527368978432.000000 |
| new-es-lab-1 | NULL   | used              |  50996518912.000000 |
| new-es-lab-1 | NULL   | available         | 450281652224.000000 |
| new-es-lab-1 | NULL   | use               |           15.160000 |
| new-es-lab-1 | ens160 | rx_bytes          | 743836026093.000000 |
| new-es-lab-1 | ens160 | rx_dropped        |      2034313.000000 |
| new-es-lab-1 | ens160 | rx_errors         |            0.000000 |
| new-es-lab-1 | ens160 | tx_bytes          | 798173487161.000000 |
| new-es-lab-1 | ens160 | tx_dropped        |            0.000000 |
| new-es-lab-1 | ens160 | tx_errors         |            0.000000 |
| new-es-lab-1 | ens160 | ms                |         2076.000000 |
| new-es-lab-1 | ens160 | rx_sec            |       977224.828935 |
| new-es-lab-1 | ens160 | tx_sec            |      1162162.267840 |
+--------------+--------+-------------------+---------------------+
48 rows in set (0.11 sec)


               

This is a quick and dirty script that I wrote in about an hour. This is only one of the many ways that you can store metric data from your devices. I prefer Elasticsearch but in some cases SQL is the answer for your use case. This document is just to show that any data store can be used, it all depends on how you format your data before inserting it into the datastore. You could move away from the title/value setup used here and spend the time mapping individual fields for every piece of data you will store, like the example below, or any other number of ways that you can think up. This all depends on your use case and how you plan on querying the data.

CREATE TABLE sqlbeat.systemdatav2 ( 
   epochts              bigint  NOT NULL    ,
   hostname             varchar(100)  NOT NULL    ,
   iface                varchar(20)      ,
   fspath               varchar(40)      ,
   operstate            varchar(10)      ,
   avgload              double      ,
   currentload          double      ,
   currentloaduser      double      ,
   currentloadsystem    double      ,
   currentloadnice      double      ,
   currentloadidle      double      ,
   rio                  bigint      ,
   wio                  bigint      ,
   rwaittime            bigint      ,
   wwaittime            bigint      ,
   twaittime            bigint      ,
   size                 bigint      ,
   used                 bigint      ,
   available            bigint      ,
   `use`                double      ,
   rx_bytes             bigint      ,
   rx_dropped           bigint      ,
   rx_errors            bigint      ,
   tx_bytes             bigint      ,
   tx_dropped           bigint      ,
   tx_errors            bigint      ,
   ms                   int      ,
   rx_sec               double      ,
   tx_sec               double      
 ) engine=InnoDB;

ALTER TABLE sqlbeat.systemdatav2 COMMENT 'The full schema for all values instead of the previous title/value setup.';

ALTER TABLE sqlbeat.systemdatav2 MODIFY epochts bigint  NOT NULL   COMMENT 'Epoch timestamp in milliseconds';

ALTER TABLE sqlbeat.systemdatav2 MODIFY operstate varchar(10)     COMMENT 'operational state of network interface';
               

I hope this post has been insightful and through provoking for those that are interested in data and analytics. There are many different ways to do everything, do not let the latest buzz words drive all of your decisions. Research and then research some more before deciding to switch out a working platform for something new and shiny.

-Parting Wisdom
--All data has a usecase. Whats yours?--