You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Stephen Mallette (Jira)" <ji...@apache.org> on 2020/03/23 11:41:00 UTC
[jira] [Commented] (TINKERPOP-2352) Gremlin Python driver default pool size makes Gremlin keep-alive difficult

    [ https://issues.apache.org/jira/browse/TINKERPOP-2352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064737#comment-17064737 ] 

Stephen Mallette commented on TINKERPOP-2352:
---------------------------------------------

Thanks for your thoughts on this one. I don't think there's a problem with changing the default pool size to what you suggest as long as we have tests that continue to validate behavior of the larger pool size somehow. A pull request would be great especially one that included some more documentation of the type you describe, of course, I think a nicer pull request would be to solve the keep-alive problem more generally as described on TINKERPOP-1886. Fixing that would be a much more robust solution. If you have the opportunity to help there it would be appreciated. If you could solve that, then perhaps we should just close this ticket in favor of that one and continue our discussion there. 

> Gremlin Python driver default pool size makes Gremlin keep-alive difficult
> --------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2352
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2352
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 3.3.5, 3.4.5
>         Environment: AWS Lambda, Python 3.7 runtime, AWS Neptune.
> (AWS Lambda functions can remain in memory and thus hold connections open for many minutes between invocations)
>            Reporter: Mark Br...e
>            Priority: Major
>
> I'm working with a Gremlin database that (like many) terminates connections if they don't execute any transactions with a timeout period.  When we want to run a traversal we first check our `GraphTraversalSource` by running `g.V().limit(1).count().next()` and if that raises an exception we know we need to reconnect before running the actual traversal.
> We've been very confused that this hasn't worked as expected: we intermittently see traversals fail with `WebSocketClosed` or other connection-related errors immediately after the "connection test" passes. 
> I've (finally) found the cause of this inconsistency is the default pool size in `gremlin_python.driver.client.Client` being 4.  This means there's no visiblity outside the `Client` of which connection in the pool is tested and/or used, and in fact no way for the application (`GraphTraversalSource`) to run keep-alive type traversals reliably.  Anytime an application passes in a pool size of `None` or a number > 1 there'll be no way to make sure that each and every connection in the pool actually sends keep-alive traversals to the remote, _except_ in the case of a single-threaded application where a tight loop could issue `pool_size` of them.  In that latter case as the application is single-threaded then a `pool_size` above 1 won't provide much benefit.
> I've raised this as a bug because I think a default `pool_size` of 1 would give much more predictable behaviour, and in the specific case of the Python driver is probably more appropriate because Python applications tend to run single-threaded by default, with multi-threading carefully added when performance requires it.  Perhaps it's a wish, but as the behaviour from the default option is quite confusing it feels more like a bug, at least.  If it would help I'm happy to raise a PR with some updated function header comments or maybe updated documentation about multi-threaded / multi-async-loop usage of gremlin-python.
> (This is my first issue here, apologies if it has some fields wrong.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)