Env:
Hadoop 2.5.1Apache Hadoop ResourceManager HA enabled.
Symptom:
ResourceManager fails to transition to Active mode with "InvalidResourceRequestException".Below stacktrace shows firstly in RM log:
Caused by: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested memory < 0, or requested memory > max configured, requestedMemory=9216, maxMemory=8192 at org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerUtils.validateResourceRequest(SchedulerUtils.java:228) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateResourceRequest(RMAppManager.java:385) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:345) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:309) at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:425) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1104) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:508) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193) ... 13 moreBelow stacktrace then repeats in RM log:
WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:122) at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805) at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:596) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:301) at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:120) ... 4 more Caused by: org.apache.hadoop.service.ServiceStateException: RMActiveServices cannot enter state STARTED from state STOPPED at org.apache.hadoop.service.ServiceStateModel.checkStateTransition(ServiceStateModel.java:129) at org.apache.hadoop.service.ServiceStateModel.enterState(ServiceStateModel.java:111) at org.apache.hadoop.service.AbstractService.start(AbstractService.java:190) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:911) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:951) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:948) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1566) at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:948) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:292) ... 5 more 2015-09-03 13:59:23,581 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
Root Cause:
This is due to YARN-3493 which is fixed in Hadoop 2.6.1, 2.8.0 and 2.7.1.This issue can happen if users lower the value of yarn.scheduler.maximum-allocation-mb and then restart ResourceManager.
ResourceManager fails to recover the applications left in RMStateStore which requires more memory than yarn.scheduler.maximum-allocation-mb, even though those applications failed for a long time.
Solution:
1. Identify the RMStateStore class.
MapR by default uses FileSystemRMStateStore which means the RMStateStore is on MFS.User may choose ZKRMStateStore also.
$ hadoop2 conf |grep yarn.resourcemanager.store.class <property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value><source>yarn-default.xml</source></property>
2. Find the location of RMStateStore.
If RMStateStore is using FileSystemRMStateStore, the parent location is defined by yarn.resourcemanager.fs.state-store.uri.$ hadoop2 conf |grep yarn.resourcemanager.fs.state-store.uri <property><name>yarn.resourcemanager.fs.state-store.uri</name><value>/var/mapr/cluster/yarn/rm/system</value><source>yarn-default.xml</source></property>Then the location of all application directories is :
/var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot
If RMStateStore is using ZKRMStateStore, the parent znode is defined by yarn.resourcemanager.zk-state-store.parent-path
$ hadoop2 conf |grep yarn.resourcemanager.zk-state-store.parent-path <property><name>yarn.resourcemanager.zk-state-store.parent-path</name><value>/rmstore</value><source>yarn-default.xml</source></property>Then the znode of all application directories is:
/rmstore/ZKRMStateRoot/RMAppRoot/
3. Move or remove all the application directories in RMStateStore.
The impact of this step is, RM UI will be clean, but the application information can still be view-able from HistoryServer UI; and also RM will not recover any failed/running applications so users need to re-submit the application.For example:
If FileSystemRMStateStore,
hadoop fs -mv /var/mapr/cluster/yarn/rm/system/FSRMStateRoot/RMAppRoot/* /backup_statestore/
If ZKRMStateStore,
Need to remove application directories one by one as below
rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_#############_####
44D97C3F05
ReplyDeletesms onay
Oyun İndirme Siteleri
Aşk Acısı Nasıl Geçer
Yurtdışı Numara Alma
Havale ile Takipçi
4574C91A17
ReplyDeletetakipçi satın al
Youtube Takipçi Hilesi
Kredi Danışmanlık Şirketleri
Bayan Takipçi
Fatura ile Takipçi