You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/12/02 20:48:00 UTC

[jira] [Commented] (TINKERPOP-2813) Improve driver usability for cases where NoHostAvailableException is currently thrown

    [ https://issues.apache.org/jira/browse/TINKERPOP-2813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642685#comment-17642685 ] 

ASF GitHub Bot commented on TINKERPOP-2813:
-------------------------------------------

spmallette opened a new pull request, #1882:
URL: https://github.com/apache/tinkerpop/pull/1882

   https://issues.apache.org/jira/browse/TINKERPOP-2813
   
   Summarized the major points of the change in the Upgrade Docs. Might be best to start there if reviewing this. Made some other adjustments to log better data for debugging purposes with the `ConnectionPool`.  Ended up making a number of adjustments to the thread pools to which various jobs were being submitted. Untangling that led to the addition of two pools which helped bring back parallel host initialization and got rid of random usage of the fork/join pool. I sense that something could be done to eliminate some of these pools but I believe it will come at the cost of a much larger refactoring effort to untangle how some of these jobs get scheduled up. I dunno.....not super pleased with it but gotta go pencils down at some point.
   
   there've been many hours of testing on these changes at this point from various folks and all results have been favorable. calling out @kenhuuu specifically, he did some basic performance testing and profiling and didn't find any problems to note:
   
   ```text
   Test Case:
       2450 short queries (return in milliseconds)
       1280 mix of short and medium queries
   
   Setup:
       m5.2xlarge
       3.5.4 Gremlin Server with gremlin-server.yaml
       Defaults for connection pooling and in flight per connection settings for Java driver
   
   Results for Three Run Average.
       3.5.4:
           Short: 29912ms
           Mixed: 65689ms
   
       TINKERPOP-2813 branch:
           Short: 30161ms
           Mixed: 65422ms
   ```
   
   




> Improve driver usability for cases where NoHostAvailableException is currently thrown
> -------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2813
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2813
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: driver
>    Affects Versions: 3.5.4
>            Reporter: Stephen Mallette
>            Assignee: Stephen Mallette
>            Priority: Blocker
>
> A {{NoHostAvailableException}} occurs in two cases:
> 1. where the {{Client}} is initialized and a failure occurs on all {{Host}} instances configured
> 2. when the {{Client}} attempts to {{chooseConnection()}} to send a request and all {{Host}} instances configured are marked unavailable.
> In the first case, you can get a cause for the failure which is helpful, but the inadequacy is that you only get the failure of the first {{Host}} to cause a problem. The second case is a bit worse because there you get no cause in the exception and it's a "fast fail" in that as soon as the request is sent there is no pause to see if the {{Host}} comes back online. Moreover, a {{Host}} can be marked for failure for the infraction of just a single {{Connection}} that may have just encountered a intermittent network issue, thus quite quickly killing the entire {{ConnectionPool}} and turning 100s or requests per second into 100s of {{NoHostAvailableException}} per second. Note that you can also get an infraction for the pool just being overloaded with requests which may signal that either the pool or server not being sized right for the current workload - in either case, the {{NoHostAvailableException}} is a bit of a harsh way to deal with that and in any event doesn't quite give the user clues as to how to deal with it.
> All in all, this situation makes {{NoHostAvailableException}} hard to debug. This ticket is meant to help smooth some of these problems. Initial thoughts for improvements include better logging, ensuring that {{NoHostAvailableException}} is not thrown without a cause, preferring more specific exceptions in the fist place to {{NoHostAvailableException}}, getting rid of "fast fails" in favor of longer pauses to see if a host can recover and taking a softer stance on when a {{Host}} is actually considered "unavailable".
> Expecting to implement this without breaking API changes, though exceptions may shift around a bit, but will try to keep those to a minimum.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)