You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Ian Robinson <ia...@gmail.com> on 2021/02/04 17:12:49 UTC

Javascript Gremlin driver and AWS Lambda connection issues

I've been working on an AWS Lambda/Javascript Gremlin driver issue for the
last couple of weeks, and have run out of ideas around how to fix it, and
wondered whether anyone on this list had any suggestions.

What I'm seeing is that long-lived Websocket connections to Amazon Neptune
– connections that survive across multiple invocations of a Lambda function
– can sometimes terminate abruptly. When this happens, the Gremlin query
being executed by the Javascript driver in a Lambda function fails silently
– it doesn't throw an exception or return a promise. The Lambda function
completes with 200 OK, but an empty response payload. From the perspective
of the client invoking the function, the function appears to have executed
successfully, but with no results.

The problem of connections from a Lambda function terminating abruptly
isn't confined to the Javascript Gremlin driver. The Java and Python
drivers, however, raise exceptions that allow the function to apply some
backoff-and-retry logic. The Javascript driver, in contrast, doesn't raise
anything that can be used to trigger some retry logic. The query just
'disappears'.

In the Javascript driver I've traced this down to the underlying socket
being closed with an ECONNRESET error. The Websocket is then closed, and
emits a 'ws close' event, but there's nothing in the driver to handle this
event in a way that would provoke an exception in the promise returned by a
query. But this all seems to happen _after_ the driver sends a query to the
server. I see the bytecode generated by the query being sent to the
Websocket, and even a frame being submitted by the Websocket to its sender,
but after that, the close event, and nothing.

The frequency with which this happens is very low: it may only happen a
couple of times per day, even in situations where I'm invoking the function
several hundred times per second. The reproducer at the moment,
unfortunately, is tied to a suite of long-running Lambda/Neptune tests.

Have you seen anything like this before? I think further diagnosis will
benefit from someone who knows Nodejs's network stack and callback/promise
model and the Javascript driver better than I do.

Thanks

ian