This article explains the action items when data node fails or becomes dead because of only one disk failure.
Env:
Hadoop 2.0Symptoms:
1. One data node is not shown from "hdfs dfsadmin -report". For example, if you have 10 data nodes configured, but the output only shows 9 data nodes available, 0 dead node.
eg:1 | Datanodes available: 9 (9 total, 0 dead) |
2. However the datanode service on that problematic node is running.
1 2 | [root@hdw1] # /etc/init.d/hadoop-hdfs-datanode status datanode (pid 7938) is running... |
3. From the datanode log, below errors shows up right after restarting datanode service:
Below FATAL error shows:1 2 3 4 5 6 7 8 9 10 11 | FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool BP-xxx-xxx.xxx.x.x-xxxxxxx (storage id DS-xxx-192.168.xxx.x-xxxxx-xxxxxxx) service to namenode.OPENKB.INFO/192.168.xxx.2:8020 org.apache.hadoop.util.DiskChecker$DiskErrorException: Too many failed volumes - current valid volumes: 4, volumes configured: 5, volumes failed: 1, volume failures tolerated: 0 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.<init>(FsDatasetImpl.java:186) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:34) at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetFactory.newInstance(FsDatasetFactory.java:30) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:857) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:819) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664) at java.lang.Thread.run(Thread.java:744) |
1 2 3 4 5 6 | ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in BPOfferService for Block pool BP-xxx-xxx.xxx.x.x-xxxxxxx (storage id DS-xxx-192.168.xxx.x-xxxxx-xxxxxxx) service to namenode.OPENKB.INFO/192.168.xxx.2:8020 java.lang.NullPointerException at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:439) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:525) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:676) at java.lang.Thread.run(Thread.java:744) |
4. "hdfs dfsadmin -report" finally found the missing data node, but it will be marked as "dead".
1 | Datanodes available: 9 (10 total, 1 dead) |
Root Cause:
Each data node configures multiple data volumes, and each of them is on one physical disk.For example, below data node has 3 data volumes configured in hdfs-site.xml.
1 2 3 4 | < property > < name >dfs.datanode.data.dir</ name > < value >/data1/dfs/data,/data2/dfs/data,/data3/dfs/data,</ value > </ property > |
By default, the parameter "dfs.datanode.failed.volumes.tolerated" is set to 0, which means:The number of volumes that are allowed to fail before a datanode stops offering service. By default any volume failure will cause a datanode to shutdown.
Fix:
After replacing the disk on problematic data volume:1. Create the data volume by system admin.
2. Create the data directory specified by dfs.datanode.data.dir.
Eg:1 | mkdir -p /data3/dfs/data |
3. Change the owner and group
1 | chown hdfs:hadoop /data3/dfs/data |
4. Start the datanode service.
1 | /etc/init .d /hadoop-hdfs-datanode status |
5. Confirm the data node is become live node using below command:
1 | hdfs dfsadmin -report |
BTW, if you want to bring the data node up with valid data volumes, and skip that broken volume.
Just change dfs.datanode.failed.volumes.tolerated to the number of failed volumes in hdfs-site.xml.
And then restart the datanode service, and it will work.
eg:
1 2 3 4 | < property > < name >dfs.datanode.failed.volumes.tolerated</ name > < value >1</ value > </ property > |
Hi, do we have to shutdown machine to replace bad disk or we can add then while it is on. ?
ReplyDeleteNo need, but it is safe to do that.
DeleteI am facing similar kind of issue but needs to know anyway to use same disk by deleting bock pool folder. I was able to traverse that disk folder using unix commands.
ReplyDelete