Friday, January 11, 2019

How to configure Drill to use cgroups to hard limit CPU resource in Redhat/CentOS 7

Goal:

This article explains how to configure Drill to use cgroups to hard limit CPU resource in Redhat/CentOS 7.

Env:

Drill 1.14
MapR 6.1
CentOS 7.4 with linux kernel 3.10.0

Solution:

As per current Drill Documentation to configure cgroups, the steps are to use cgroups with libcgroup.
However libcgroup is depcreated in RedHat/CentOS 7.
This article shares some key steps to configure Drill to use cgroups with systemd in RedHat/CentOS 7.
One major difference is you do NOT need to install libcgroup.

1. Confirm cgroups are mounted by default.

# mount -v |grep -i cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)

2. Understand the logic how Drill configures cgroups when drillbit starts.

The code logic is starting from Drill 1.14, and it is inside bin/drillbit.sh:
    SYS_CGROUP_DIR=${SYS_CGROUP_DIR:-"/sys/fs/cgroup"}
    if [ -f $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs ]; then
      echo $dbitPid > $SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs
Inside drill-env.sh, there are 2 environment variables:
  • SYS_CGROUP_DIR -- by default "/sys/fs/cgroup"
  • DRILLBIT_CGROUP -- by default "drillcpu"
What drillbit does when starting is: Put the pid of drillbit into the file "$SYS_CGROUP_DIR/cpu/$DRILLBIT_CGROUP/cgroup.procs" which is by default "/sys/fs/cgroup/cpu/drillcpu/cgroup.procs".
That is it.

After understanding the logic of what drillbit does when starting, then we can understand what should we do next.

3. Uncomment cgroups related enviroment variables inside drill-env.sh

export DRILL_PID_DIR=${DRILL_PID_DIR:-$DRILL_HOME}
export SYS_CGROUP_DIR=${SYS_CGROUP_DIR:-"/sys/fs/cgroup"}
export DRILLBIT_CGROUP=${DRILLBIT_CGROUP:-"drillcpu"}
Of course, you can change the directory of cgroup or the cgroup name as you like.

4. Create the cgroup directory based on $DRILLBIT_CGROUP

mkdir -p /sys/fs/cgroup/cpu/drillcpu
echo 100000 > /sys/fs/cgroup/cpu/drillcpu/cpu.cfs_quota_us
echo 100000 > /sys/fs/cgroup/cpu/drillcpu/cpu.cfs_period_us
Here I am hard limiting the CPU resource to only 1 CPU core using the 2 parameters:
  • cpu.cfs_period_us
    The cpu.cfs_period_us parameter specifies a segment of time (in microseconds represented by us for µs) for how often the access to CPU resources should be reallocated.
  • cpu.cfs_quota_us
    The cpu.cfs_quota_us parameter specifies the total amount of runtime (in microseconds represented by usfor µs) for which all tasks in the Drill cgroup can run during one period (as defined by cpu.cfs_period_us). As soon as tasks in the Drill cgroup use the time specified by the quota, they are throttled for the remainder of the time specified by the period and not allowed to run until the next period.
If you want to soft limit instead of hard limit CPU resource, you can choose another parameter "cpu.shares". Of course, you may want to use any other cgroups parameters.
For the definition of those parameters, please check linux kernel documentation

5. Change the ownership/permission of this cgroup directory

As per our understanding from #3, drillbit will put its pid into file "/sys/fs/cgroup/cpu/drillcpu/cgroup.procs". That is why the user("mapr" by default) who starts drillbit should have the permission to write to that file.
chown -R mapr:mapr /sys/fs/cgroup/cpu/drillcpu

6. Restart drillbit

After restarting drillbit, please double check that the pid of each drillbit is put into "/sys/fs/cgroup/cpu/drillcpu/cgroup.procs".
# cat /sys/fs/cgroup/cpu/drillcpu/cgroup.procs
29589
# jps -m|grep -i drillbit
29589 Drillbit
This means the drillbit process will be controlled by this cgroup.

7. Test

Run some complex query and check the "top -p <pid of drillbit>" to confirm that only 1 CPU core can be used.
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29589 mapr      20   0 4215740 1.139g  53400 S  99.0  7.3   1:29.58
Then reduce parameter "cpu.cfs_quota_us" from current 100000 to 50000.
echo 50000 > /sys/fs/cgroup/cpu/drillcpu/cpu.cfs_quota_us
Run the complex query again, and you will find that only 0.5 CPU core can be used by drillbit now.
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
29589 mapr      20   0 4167360 1.111g  53512 S  49.8  7.2


No comments:

Post a Comment

Popular Posts