You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Haoze Wu (Jira)" <ji...@apache.org> on 2021/09/06 03:35:00 UTC

[jira] [Created] (HBASE-26256) The potential delay of HDFS RPC in HRegion may cause data inconsistency and some HBase shell commands hanging

Haoze Wu created HBASE-26256:
--------------------------------

             Summary: The potential delay of HDFS RPC in HRegion may cause data inconsistency and some HBase shell commands hanging
                 Key: HBASE-26256
                 URL: https://issues.apache.org/jira/browse/HBASE-26256
             Project: HBase
          Issue Type: Bug
          Components: regionserver
    Affects Versions: 2.4.2
            Reporter: Haoze Wu


When a RegionServer is initializing a new region, it writes its internal metadata (e.g., WAL) in the HDFS cluster. We find that this write operation can be potentially blocked due to network issues or overloading on HDFS side, and the delay will result in inconsistency to HBase clients and cause multiple HBase APIs to hang as well.

*Reproduction*

   Steps to reproduce the symptom from scratch:
 # Start a HDFS cluster (1 NameNode + 2 DataNodes) with the default configuration.
 # Start a ZooKeeper cluster (3 nodes) with the default configuration.
 # Start a HBase cluster (1 Master + 2 RegionServers) with the default configuration.
 # In one of the RegionServers, introduce a delay by invoking `Thread.sleep` when it is creating its third region (alternatively, use a network packet loss injection tool like `tc`)
 # When the HBase cluster just gets started, the fault has not yet been triggered. We use the default HBase shell by running `bin/hbase shell` in the terminal. In the HBase shell, we repeatedly use the `create` command to create new tables, until the fault is triggered.

 

When the fault occurs, we observe several symptoms as follows:
 # The HBase shell running the `create` command hangs, without any log or warning.
 # If we start another HBase shell and run the `list` command to see all the tables, we can see the table in the result. However, this table has actually not been created yet. Ideally the client should not see this pending table before `create` succeeds. 
 # If we start another HBase shell and run the `disable` command to disable this table, the HBase shell will hang, without any log or warning. Ideally, we should see some error or warning within a short duration of time, because this table has not been created yet.

 

    The stack trace:

 
{code:java}
"RS_OPEN_REGION-regionserver/razor15:16022-0" #144 daemon prio=5 os_prio=0 tid=0x00007f4c34ed8000 nid=0x4463 waiting on condition [0x00007f4bfd496000]   java.lang.Thread.State: TIMED_WAITING (sleeping)    at java.lang.Thread.sleep(Native Method)    at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:1075)    at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:955)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8081)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegionFromTableDir(HRegion.java:8040)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:8016)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7974)    at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7925)    at org.apache.hadoop.hbase.regionserver.handler.AssignRegionHandler.process(AssignRegionHandler.java:145)    at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)    at java.lang.Thread.run(Thread.java:748)
{code}
 

   Relevant code snippet:
{code:java}
// file path: hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
// class: org.apache.hadoop.hbase.regionserver.HRegion

public class HRegion implements HeapSize, PropagatingConfigurationObserver, Region {
// ...
  private long initializeRegionInternals(final CancelableProgressable reporter,
      final MonitoredTask status) throws IOException {
  // ...
  if (!isRestoredRegion) {
    // ...
    if (RegionReplicaUtil.isDefaultReplica(getRegionInfo())) {
      // ...
      // At and only at the third time of invocation,
      // invoke Thread.sleep, to simulate a delay of HDFS RPC 
      WALSplitUtil.writeRegionSequenceIdFile(getWalFileSystem(), getWALRegionDir(),
        nextSeqId - 1);
      // ...
    }
  }
  // ...
  }
// ...
}
{code}
*Fix*

We’re not quite sure about the root causes for the inconsistencies or the blocking of other APIs. One potential simple fix is to protect the  `WALSplitUtil.writeRegionSequenceIdFile` operation (or the HDFS RPCs inside it) with timeout. We checked that throwing a timeout exception when the operation takes too long would resolve the aforementioned symptoms.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)