You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Javier Akira Luca de Tena (Jira)" <ji...@apache.org> on 2020/05/29 08:36:00 UTC

[jira] [Created] (HBASE-24469) Hedged read might hang infinitely if read data from all DN failed

Javier Akira Luca de Tena created HBASE-24469:
-------------------------------------------------

             Summary: Hedged read might hang infinitely if read data from all DN failed
                 Key: HBASE-24469
                 URL: https://issues.apache.org/jira/browse/HBASE-24469
             Project: HBase
          Issue Type: Bug
            Reporter: Javier Akira Luca de Tena


Found out that after an ungraceful Datanode shutdown, the number of HBase active handlers started to grow, making RegionServer stuck and unusable.

Took the thread dump and found out multiple read handlers were in some kind of dead lock state and also write handlers stuck.

This also caused to not be able to flush the memstore because it was waiting for this lock: [https://github.com/apache/hbase/blob/136414dd72a80f379b80cd6f74b5b6ebd78f33ec/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HStore.java#L1225]

Without being able to flush it, I could not gracefully stop the RegionServer, since we can't move out the flushing region.

 

Found out that the real issue was in Hadoop's DFSInputStream. When no hedged reads succeed, the internal hedgedService.take() call hangs forever since it's internally using a BlockingQueue: [https://github.com/apache/hadoop/blob/rel/release-2.8.5/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1435]

 

This is the Hadoop side issue: https://issues.apache.org/jira/browse/HDFS-11303 and it's fixed for 2.9.0.

This is not directly related with HBase code, but just wanted community to be aware that with current used Hadoop used version (2.8.5), this issue could happen.

 

I would like to suggest to upgrade the used Hadoop version to 2.9.0.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)