Tuesday, February 24, 2015

How to purge old data in MapR Metrics Database

Goal:

How to purge old data if MapR metrics database is running out of disk space.

Solution:

1. Identify nodes which are running hoststats service.

maprcli node list -filter '[csvc==hoststats]' -columns service

2. Identify retention policy of MapR metrics.

Log on one of above nodes, check the content of /opt/mapr/conf/db.conf.
Normally the largest table in MapR metrics database is "METRIC_TRANSACTION" where the finest data granularity is kept. It is partitioned by day and has 3 partitions by default.
Parameter "db.partition.finest.count.days" controls how many days' data it will store;

Similarly, table "METRIC_TRANSACTION_SUMMARY_DAILY" by default has 15 partitions which is controlled by parameter "db.partition.fine.count.days";

Table "METRIC_TRANSACTION_SUMMARY_YEARLY" by default has 100 partitions which is controlled by parameter "db.partition.coarse.count.years".

3. Change retention policy.

If metrics database still has enough disk space currently and it is not an urgent requirement to purge old data immediately, we can change the retention policy on the nodes for 2 different services:
  a. Change the retention policy in /opt/mapr/conf/db.conf on all nodes which are running "hoststats" service, and then restart "hoststats" services using below command:
maprcli node services -name hoststats -action restart -filter '[csvc==hoststats]'
This is to make sure all "hoststats" services insert data into correct partitions of MySQL tables.
   b. Change the retention policy in /opt/mapr/conf/db.conf on all nodes which are running webserver service, and then restart "webserver" services using below command:
maprcli node services -name webserver -action restart -filter '[csvc==webserver]'
Warden where the first webserver service lives will periodically purge the metrics data according to retention policy.
Below command is used to identify the node of the first webserver:
maprcli node list -columns "configuredservice,hostname" | grep "webserver" | head -n 1
Reducing the retention policy may not purge the data immediately.
If things are urgent, please follow below steps to purge the old data immediately. 

4.  Identify the largest table.

Normally table "METRIC_TRANSACTION" is the largest table,  however it is best to double check.
Go to MySQL node, and identify the data directory by checking the "--datadir" option of mysqld process.
For example:
# ps -ef|grep -i mysqld|grep -v grep
root      1270     1  0 15:45 ?        00:00:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --pid-file=/var/run/mysqld/mysqld.pid --basedir=/usr --user=mysql
mysql     1378  1270  0 15:45 ?        00:00:13 /usr/libexec/mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock
List files in the metrics database directory under MySQL data directory.
For example, the metrics database name is "metrics":
ls -altr /var/lib/mysql/metrics

The file name includes the table name.
Assume table "METRIC_TRANSACTION" is the largest table in below steps.

5. Identify which partition(s) need to be purged.

Firstly using "show create table <tablename>" to find out how many partitions belong to this table.
For example, table "METRIC_TRANSACTION" has 3 partitions -- p0, p1 and p2.
mysql> show create table METRIC_TRANSACTION ;
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table              | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| METRIC_TRANSACTION | CREATE TABLE `METRIC_TRANSACTION` (
  `PARTITION_ID` tinyint(4) NOT NULL,
  `EVENT_TIME` bigint(20) NOT NULL,
  `M_NAME` varchar(64) NOT NULL,
  `M_VALUE` bigint(20) NOT NULL,
  `NODE_ID` varchar(64) NOT NULL,
  `CLUSTER_ID` varchar(32) NOT NULL,
  `USER_ID` varchar(64) DEFAULT NULL,
  `PROCESS_ID` varchar(64) DEFAULT NULL,
  `CREATED` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`PARTITION_ID`,`CLUSTER_ID`,`NODE_ID`,`M_NAME`,`EVENT_TIME`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
/*!50100 PARTITION BY LIST (PARTITION_ID)
(PARTITION p2 VALUES IN (2) ENGINE = InnoDB,
 PARTITION p1 VALUES IN (1) ENGINE = InnoDB,
 PARTITION p0 VALUES IN (0) ENGINE = InnoDB) */ |
+--------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
Since each partition of "METRIC_TRANSACTION" contains one day's data, use below SQL to find out which partitions have the oldest data.
For example:
mysql> select PARTITION_ID,max(EVENT_TIME) from METRIC_TRANSACTION group by PARTITION_ID order by 2 desc;
+--------------+-----------------+
| PARTITION_ID | max(EVENT_TIME) |
+--------------+-----------------+
|            0 |   1424817768000 |
|            2 |   1424814429000 |
+--------------+-----------------+
2 rows in set (0.10 sec)
Above result shows partition 0 has the most current data, partition 2 has older data, other partitions have no data.
Assume the decision is to purge partition 1 and 2 of "METRIC_TRANSACTION" in below steps.

6. Drop and recreate partitions.

If partition 1 and 2 need to be dropped and recreated:
ALTER TABLE METRIC_TRANSACTION DROP PARTITION p1;
ALTER TABLE METRIC_TRANSACTION ADD PARTITION (PARTITION p1 VALUES IN (1));

ALTER TABLE METRIC_TRANSACTION DROP PARTITION p2;
ALTER TABLE METRIC_TRANSACTION ADD PARTITION (PARTITION p2 VALUES IN (2));
After that, verify current partition number using below SQL:
show create table METRIC_TRANSACTION ;

7. Verify disk space.

Make sure the disk space is back after dropping old partitions of largest tables.

No comments:

Post a Comment

Popular Posts