You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2020/09/02 15:45:47 UTC

[GitHub] [hbase] shahrs87 commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

shahrs87 commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r482175115



##########
File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +968,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool
     }
   }
 
+  private void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();
+      if (!userRegionLock.tryLock(waitTime, TimeUnit.MILLISECONDS)) {

Review comment:
       We have an internal customer who wants a strict 15 seconds SLA for every operation. Since operation timeout is end to end timeout which includes all the retries, sleep within retries so we suggested them to use operation timeout as 15 seconds.
   Also we recommended them to set scanner timeout period to 7 seconds with  retries config (hbase.client.retries.number)  set to 2 .  There is some sleep interval between each attempt and we expect the call to complete within 15 seconds and if it doesn't then operation timeout will kick in and fail the call.
   But we found out that getRegionLocations call is not bounded by operation timeout.
   Now if we set the lock timeout to same as operation timeout, then in worst case scenario call will fail in 15 (lock timeout) + 15 (lock timeout 2nd try) + 1 (assuming sleep of 1 second) = 31 seconds
   If we set the lock timeout to same as scanner timeout, then in worst case scenario the call will fail in 7 (scanner timeout) + 7 (scanner timeout) + 1 (sleep between tries) = 16 seconds which is closer to SLA that we promised.
   Hope this makes sense. If still the community wants to go forward with operation timeout, I will change it to operation timeout.
   @infraio  @bharathv  @virajjasani  @saintstack  @apurtell 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org