You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Guilherme Quentel Melo (Jira)" <ji...@apache.org> on 2020/08/20 16:33:00 UTC
[jira] [Updated] (TINKERPOP-2405) gremlinpython: traversal hangs when the connection is established but the servers stops responding later

     [ https://issues.apache.org/jira/browse/TINKERPOP-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Guilherme Quentel Melo updated TINKERPOP-2405:
----------------------------------------------
    Description: 
On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.

The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.

h1. How to reproduce

# Start a local gremlin server on the default port 8182
# On a terminal, run {{nc}} to listen on port 8183 with {{nc -lk 8183}}
# Run the following python code to connect to the **8183** port:
{code:python}
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g")                                               
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()
{code}
# You will see the connection request on {{nc}} output. First time, don't do anything and the it will timeout saying the connection couldn't be established.
# Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
## Copy the whole request from `nc -l` output
## On another terminal, open a connection to the gremlin server with `nc 127.0.0.1 8182`
## Paste the request you copied before to `nc 127.0.0.1 8182` terminal
## Copy the gremlin server response and paste into `nc -l` output
## The connection will be established and the `nc -l` will receive some unprintable chars corresponding to `g.V().limit(1).toList()`
## Now, if there is no response from `nc -l` process, the python code will hang forever.

h1. Possible solution

As I looked into it, the problem seems that the {TornadoTransport} implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to {{self._loop.run_sync}} can solve the issue, at least raising an exception when the server does not respond.

If I change the example above:

{code:python}
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.tornado.transport import TornadoTransport                                                         
from gremlin_python.process.anonymous_traversal import traversal

class CustomTornadoTransport(TornadoTransport): 
    def read(self): 
        return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g", transport_factory=CustomTornadoTransport)
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()
{code}

and repeat the same steps, {{g.V().limit(1).toList()}} times out after not getting any response from the server for 5 seconds.

I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.


  was:
On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.

The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.

h1. How to reproduce

# Start a local gremlin server on the default port 8182
# On a terminal, run {{nc}} to listen on port 8183 with {{nc -lk 8183}}
# Run the following python code to connect to the **8183** port:
{code:python}
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.process.anonymous_traversal import traversal

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g")                                               
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()
{code}
# You will see the connection request on {{nc}} output. First time, don't do anything and the it will timeout saying the connection couldn't be established.
# Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
## Copy the whole request from `nc -l` output
## On another terminal, open a connection to the gremlin server with `nc 127.0.0.1 8182`
## Paste the request you copied before to `nc 127.0.0.1 8182` terminal
## Copy the gremlin server response and paste into `nc -l` output
## The connection will be established and the `nc -l` will receive some unprintable chars corresponding to `g.V().limit(1).toList()`
## Now, if there is no response from `nc -l` process, the python code will hang forever.

h1. Possible solution

As I looked into it, the problem seems that the `TornadoTransport` implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to {{self._loop.run_sync}} can solve the issue, at least raising an exception when the server does not respond.

If I change the example above:

{code:python}
from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
from gremlin_python.driver.tornado.transport import TornadoTransport                                                         
from gremlin_python.process.anonymous_traversal import traversal

class CustomTornadoTransport(TornadoTransport): 
    def read(self): 
        return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)

remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g", transport_factory=CustomTornadoTransport)
g = traversal().withRemote(remote_connection)                                                                                
g.V().limit(1).toList()
{code}

and repeat the same steps, {{g.V().limit(1).toList()}} times out after not getting any response from the server for 5 seconds.

I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.



> gremlinpython: traversal hangs when the connection is established but the servers stops responding later
> --------------------------------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2405
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2405
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: python
>    Affects Versions: 3.4.6
>         Environment:  Ubuntu 18.04, Flask 1.1.1, python 3.8.1, Amazon Neptune, Gremlin Server
>            Reporter: Guilherme Quentel Melo
>            Priority: Major
>
> On a HTTP server that connects to Amazon Neptune, I've seen some situations where a request just hangs and never returns any response. While investigating this, I found out that it hangs right when it is going to query Neptune.
> The problem is that if the connection to Gremlin/Neptune is established and after that the server does not respond any more, the gremlin connection never times out, making the process/thread wait forever for a response that will never come.
> h1. How to reproduce
> # Start a local gremlin server on the default port 8182
> # On a terminal, run {{nc}} to listen on port 8183 with {{nc -lk 8183}}
> # Run the following python code to connect to the **8183** port:
> {code:python}
> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
> from gremlin_python.process.anonymous_traversal import traversal
> remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g")                                               
> g = traversal().withRemote(remote_connection)                                                                                
> g.V().limit(1).toList()
> {code}
> # You will see the connection request on {{nc}} output. First time, don't do anything and the it will timeout saying the connection couldn't be established.
> # Now repeat the steps, but make nc respond to establish the connection. The quickest way I found is to manually relay the message the real gremlin server:
> ## Copy the whole request from `nc -l` output
> ## On another terminal, open a connection to the gremlin server with `nc 127.0.0.1 8182`
> ## Paste the request you copied before to `nc 127.0.0.1 8182` terminal
> ## Copy the gremlin server response and paste into `nc -l` output
> ## The connection will be established and the `nc -l` will receive some unprintable chars corresponding to `g.V().limit(1).toList()`
> ## Now, if there is no response from `nc -l` process, the python code will hang forever.
> h1. Possible solution
> As I looked into it, the problem seems that the {TornadoTransport} implementation does not pass any timeout when reading (and writing) messages. So, passing a timeout to {{self._loop.run_sync}} can solve the issue, at least raising an exception when the server does not respond.
> If I change the example above:
> {code:python}
> from gremlin_python.driver.driver_remote_connection import DriverRemoteConnection
> from gremlin_python.driver.tornado.transport import TornadoTransport                                                         
> from gremlin_python.process.anonymous_traversal import traversal
> class CustomTornadoTransport(TornadoTransport): 
>     def read(self): 
>         return self._loop.run_sync(lambda: self._ws.read_message(), timeout=5)
> remote_connection = DriverRemoteConnection("ws://127.0.0.1:8183/gremlin", "g", transport_factory=CustomTornadoTransport)
> g = traversal().withRemote(remote_connection)                                                                                
> g.V().limit(1).toList()
> {code}
> and repeat the same steps, {{g.V().limit(1).toList()}} times out after not getting any response from the server for 5 seconds.
> I'm not sure if there should be any timeout for writing, but it seems it should definitely be set for read operations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)