Wednesday, April 29, 2015

Need to restart Resource Manager when fair-scheduler.xml is firstly created.

Env:

Hadoop 2.5.1

Symptom:

1. Once fair-scheduler.xml is created for the first time, Resource Manager(RM) can not load it every 10 seconds as described in:
https://hadoop.apache.org/docs/r2.5.1/hadoop-yarn/hadoop-yarn-site/FairScheduler.html
"The allocation file is reloaded every 10 seconds, allowing changes to be made on the fly."

2. Resource Manager web UI(http://<RM host/IP>:8088/cluster/scheduler) can not show the changes made in fair-scheduler.xml.

Root Cause:

By default, yarn.scheduler.fair.allocation.file is set to fair-scheduler.xml.
RM will search for the allocation file on the classpath (which typically includes the Hadoop conf directory) when RM is started.
If RM can not find it during starting, it will print below warning message in RM logs:
WARN org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: fair-scheduler.xml not found on the classpath.
After that, RM will not start thread "AllocationFileLoaderService" whose role is to keep monitoring the changes in allocation file and reload it every 10 seconds.

The code logic is in AllocationFileLoaderService.java:
Function getAllocationFile() is to search for the allocation file, if it could not find not it, it will return a NULL "allocFile".
      if (url == null) {
        LOG.warn(allocFilePath + " not found on the classpath.");
        allocFile = null;
Function serviceInit() calls getAllocationFile(), and only if the returned "allocFile" != NULL, it starts the "AllocationFileLoaderService" thread.
public void serviceInit(Configuration conf) throws Exception {
    this.allocFile = getAllocationFile(conf);
    if (allocFile != null) {
      reloadThread = new Thread() {
      ...}

So in all, if the allocation file does not exist in classpath when RM is started, RM can not reload it automatically. Once allocation file is created in classpath for the first time, RM needs to be restarted once.

Solution:

Make sure allocation file(fair-scheduler.xml by default) does exist in classpath before RM is started.
If allocation file is created after RM is started, RM needs to be restarted to trigger the thread which is to load the allocation file automatically.

To confirm if RM can reload the allocation file every 10 seconds, try to make any changes to the file and monitor the RM log. Below message should show up:
INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.AllocationFileLoaderService: Loading allocation file /opt/mapr/hadoop/hadoop-2.5.1/etc/hadoop/fair-scheduler.xml


No comments:

Post a Comment

Popular Posts