You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by GitBox <gi...@apache.org> on 2022/04/15 01:49:12 UTC

[GitHub] [hbase] functioner opened a new pull request, #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

functioner opened a new pull request, #4349:
URL: https://github.com/apache/hbase/pull/4349

   Propose a patch for [HBASE-26955](https://issues.apache.org/jira/browse/HBASE-26955)
   Probably this patch should be further improved, because it ignores the backoff factor when not encountering exceptions of dead server and always use the short delay for retries. I guess probably we need a separate retry loop to handle this issue, and enforce the retry backoff mechanism there. I am looking for suggestions and comments. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4349:
URL: https://github.com/apache/hbase/pull/4349#issuecomment-1099790622

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   3m 53s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  3s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 58s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 16s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   4m  6s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 18s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 39s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 59s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 14s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 19s |  hbase-client in the patch passed.  |
   |  |   |  20m 51s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/artifact/yetus-jdk11-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4349 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux 77378cdca4c4 5.4.0-1025-aws #25~18.04.1-Ubuntu SMP Fri Sep 11 12:03:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-11.0.10+9 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/testReport/ |
   | Max. process+thread count | 189 (vs. ulimit of 30000) |
   | modules | C: hbase-client U: hbase-client |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4349:
URL: https://github.com/apache/hbase/pull/4349#issuecomment-1099792290

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 40s |  Docker mode activated.  |
   ||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  No case conflicting files found.  |
   | +1 :green_heart: |  hbaseanti  |   0m  0s |  Patch does not have any anti-patterns.  |
   | +1 :green_heart: |  @author  |   0m  0s |  The patch does not contain any @author tags.  |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 26s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 41s |  master passed  |
   | +1 :green_heart: |  checkstyle  |   0m 16s |  master passed  |
   | +1 :green_heart: |  spotbugs  |   0m 39s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m  4s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 36s |  the patch passed  |
   | +1 :green_heart: |  checkstyle  |   0m 14s |  the patch passed  |
   | +1 :green_heart: |  whitespace  |   0m  0s |  The patch has no whitespace issues.  |
   | +1 :green_heart: |  hadoopcheck  |  12m  3s |  Patch does not cause any errors with Hadoop 3.1.2 3.2.2 3.3.1.  |
   | +1 :green_heart: |  spotbugs  |   0m 41s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   0m  9s |  The patch does not generate ASF License warnings.  |
   |  |   |  25m 25s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/artifact/yetus-general-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4349 |
   | Optional Tests | dupname asflicense javac spotbugs hadoopcheck hbaseanti checkstyle compile |
   | uname | Linux 0ed52d39af65 5.4.0-1071-aws #76~18.04.1-Ubuntu SMP Mon Mar 28 17:49:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   | Max. process+thread count | 60 (vs. ulimit of 30000) |
   | modules | C: hbase-client U: hbase-client |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/console |
   | versions | git=2.17.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache-HBase commented on pull request #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

Posted by GitBox <gi...@apache.org>.
Apache-HBase commented on PR #4349:
URL: https://github.com/apache/hbase/pull/4349#issuecomment-1099788419

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime | Comment |
   |:----:|----------:|--------:|:--------|
   | +0 :ok: |  reexec  |   0m 25s |  Docker mode activated.  |
   | -0 :warning: |  yetus  |   0m  2s |  Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --whitespace-eol-ignore-list --whitespace-tabs-ignore-list --quick-hadoopcheck  |
   ||| _ Prechecks _ |
   ||| _ master Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 25s |  master passed  |
   | +1 :green_heart: |  compile  |   0m 18s |  master passed  |
   | +1 :green_heart: |  shadedjars  |   3m 38s |  branch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  master passed  |
   ||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   2m 14s |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  javac  |   0m 17s |  the patch passed  |
   | +1 :green_heart: |  shadedjars  |   3m 40s |  patch has no errors when building our shaded downstream artifacts.  |
   | +1 :green_heart: |  javadoc  |   0m 16s |  the patch passed  |
   ||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m  9s |  hbase-client in the patch passed.  |
   |  |   |  15m 31s |   |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/artifact/yetus-jdk8-hadoop3-check/output/Dockerfile |
   | GITHUB PR | https://github.com/apache/hbase/pull/4349 |
   | Optional Tests | javac javadoc unit shadedjars compile |
   | uname | Linux f9795482b488 5.4.0-96-generic #109-Ubuntu SMP Wed Jan 12 16:49:16 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/hbase-personality.sh |
   | git revision | master / f990f56e8e |
   | Default Java | AdoptOpenJDK-1.8.0_282-b08 |
   |  Test Results | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/testReport/ |
   | Max. process+thread count | 161 (vs. ulimit of 30000) |
   | modules | C: hbase-client U: hbase-client |
   | Console output | https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-4349/1/console |
   | versions | git=2.17.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.12.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] functioner commented on pull request #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

Posted by GitBox <gi...@apache.org>.
functioner commented on PR #4349:
URL: https://github.com/apache/hbase/pull/4349#issuecomment-1100748255

   @Apache9 
   I think you have a fair point.
   I think the key question here is how quickly can we retry without taking high risk of greatly increasing the load of the server. To answer this question, we need to re-examine the goals of using exponential backoff and whether our code implementation complies with this design goal.
   So far I haven't found a very detailed discussion on the rationales for adding exponential backoff (maybe [HBASE-511](https://issues.apache.org/jira/browse/HBASE-511)?) to the current HBase client. But from the code comments and Jira issues and [wikipedia](https://en.wikipedia.org/wiki/Exponential_backoff), I guess the goals of HBase's exponential backoff may include:
   1. **When the server is not available**, the client should avoid to frequently resend the requests without getting substantial responses. The exponential backoff is a trade-off between waiting for the server startup and resending the request.
   2. **When the clients are sending too many requests to the server**, a failed request is signal to let the clients slow down, so that the clients' request rates reach an acceptable equilibrium.
   3. **When the server tries to handle the request and encounters some error**, the server may want the client to wait a little more time so that the server can have enough time to tolerate the error. The exponential backoff is a trade-off between waiting for the fault tolerance and resending the request.
   
   In general, I feel that if we can find the reason for the request error (e.g., server unavailable), we can **do the fine-grained exponential backoff separately, based on the error**. For example, I find that in the current master branch of Hadoop and Kafka, the client tries to connect to the server with exponential backoff pause time, and after building the connection, the client uses another exponential backoff retry loop to handle the potential exception. They do not share the same exponential backoff retry loop because they are handling different problems. **Blindly handling those errors in one single exponential backoff retry loop may result in unnecessary long pause time, without solving any problem.**
   
   Specifically, in our case, the HMaster checks the initialization before handling the clients' requests. My client gets these exceptions, waits for this initialization, using the exponential backoff.
   https://github.com/apache/hbase/blob/2622fa05a6aa9e0b57d9d15cf243af11927e9a2d/hbase-server/src/main/java/org/apache/hadoop/hbase/master/HMaster.java#L3091-L3100
   And after the successful initialization, the server can handle my clients' request but somehow gets some error in:
   https://github.com/apache/hbase/blob/2622fa05a6aa9e0b57d9d15cf243af11927e9a2d/hbase-server/src/main/java/org/apache/hadoop/hbase/procedure2/store/region/RegionProcedureStore.java#L360-L372
   The `UncheckedIOException` is passed to my client, and then unfortunately, the accumulated retry number (due to multiple previous `PleaseHoldException`) results in a very long pause time.
   
   As you suggested, I can "set retry number to 1". And I think it should be automatically done by the client. The `PleaseHoldException` and the `UncheckedIOException` from `update` operation represents two different reasons for exponential backoff. `PleaseHoldException` asks the client to wait for the server initialization in a exponential backoff retry loop. And after that, to handle the potential errors of processing client requests, we should start another exponential backoff retry loop, i.e., "set retry number to 1".
   
   WDYT?
   If you are convinced, do you have suggestions about enforcing this kind of logic in the code? I feel we may need to add some flags to indicate whether we are waiting for server initialization or handling errors in request processing. And then reset the retry number to 1 when the flag changes.
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hbase] Apache9 commented on pull request #4349: HBASE-26955 Improvement of the pause time between retries in Rpc caller

Posted by GitBox <gi...@apache.org>.
Apache9 commented on PR #4349:
URL: https://github.com/apache/hbase/pull/4349#issuecomment-1099894306

   In general I do not think we should use the same logic in RemoteProcedureResultReporter. At region server side we know the procedure result is very important so we sould try to send the result back to master soon.
   
   But at client side, changing the backoff logic has big side effect, it will greatly increase the load of the cluster, especially what you change here is for almost all rpc request, not only the admin rpc request.
   
   So in general I suggest we keep the code unchanged. For your scenario, if you really want to retry quickly, you can set retry number to 1, then the hbase client will fail soon and you can retry immediately by yourself.
   
   WDYT?
   
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@hbase.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org