You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Manoj Govindassamy (JIRA)" <ji...@apache.org> on 2016/08/19 18:32:20 UTC
[jira] [Created] (HDFS-10780) Block replication not happening on
removing a volume when data being written to a datanode --
TestDataNodeHotSwapVolumes fails
Manoj Govindassamy created HDFS-10780:
-----------------------------------------
Summary: Block replication not happening on removing a volume when data being written to a datanode -- TestDataNodeHotSwapVolumes fails
Key: HDFS-10780
URL: https://issues.apache.org/jira/browse/HDFS-10780
Project: Hadoop HDFS
Issue Type: Bug
Components: hdfs
Affects Versions: 3.0.0-alpha1
Reporter: Manoj Govindassamy
Assignee: Manoj Govindassamy
TestDataNodeHotSwapVolumes occasionally fails in the unit test testRemoveVolumeBeingWrittenForDatanode. Data write pipeline can have issues as there could be timeouts, data node not reachable etc, and in this test case it was more of induced one as one of the volumes in a datanode is removed while block write is in progress. Digging further in the logs, when the problem happens in the write pipeline, the error recovery is not happening as expected leading to block replication never catching up.
Running org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 44.495 sec <<< FAILURE! - in org.apache.hadoop.hdfs.serv
testRemoveVolumeBeingWritten(org.apache.hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes) Time elapsed: 44.354 se
java.util.concurrent.TimeoutException: Timed out waiting for /test to reach 3 replicas
Results :
Tests in error:
TestDataNodeHotSwapVolumes.testRemoveVolumeBeingWritten:637->testRemoveVolumeBeingWrittenForDatanode:714 ยป Timeout
Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
Following exceptions are not expected in this test run
{noformat}
614 2016-08-10 12:30:11,269 [DataXceiver for client DFSClient_NONMAPREDUCE_-640082112_10 at /127.0.0.1:58805 [Receiving block BP-1852988604-172.16.3.66-1470857409044:blk_1073741825_1001]] DEBUG datanode.Da taNode (DataXceiver.java:run(320)) - 127.0.0.1:58789:Number of active connections is: 2
615 java.lang.IllegalMonitorStateException
616 at java.lang.Object.wait(Native Method)
617 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList.waitVolumeRemoved(FsVolumeList.java:280)
618 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.removeVolumes(FsDatasetImpl.java:517)
619 at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:832)
620 at org.apache.hadoop.hdfs.server.datanode.DataNode.removeVolumes(DataNode.java:798)
{noformat}
{noformat}
720 2016-08-10 12:30:11,287 [DataNode: [[[DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-project/hadoop-hdfs/target/test/data/dfs/data/data1/, [DISK]file:/Users/manoj/work/ups-hadoop/hadoop-hdfs-projec t/hadoop-hdfs/target/test/data/dfs/data/data2/]] heartbeating to localhost/127.0.0.1:58788] ERROR datanode.DataNode (BPServiceActor.java:run(768)) - Exception in BPOfferService for Block pool BP-18529 88604-172.16.3.66-1470857409044 (Datanode Uuid 711d58ad-919d-4350-af1e-99fa0b061244) service to localhost/127.0.0.1:58788
721 java.lang.NullPointerException
722 at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.getBlockReports(FsDatasetImpl.java:1841)
723 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.blockReport(BPServiceActor.java:336)
724 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:624)
725 at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:766)
726 at java.lang.Thread.run(Thread.java:745)
{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org