You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "FlorianHockmann (GitHub)" <gi...@apache.org> on 2018/12/11 15:38:58 UTC

[GitHub] [tinkerpop] FlorianHockmann opened pull request #1016: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

https://issues.apache.org/jira/browse/TINKERPOP-1774
https://issues.apache.org/jira/browse/TINKERPOP-1775

This allows connections to be reused for different requests in parallel which should lead to better utilization of connections / lower resource usage. The ConnectionPool now also has a min and max size. A new connection is only created if all existing connections reached their max in-flight limit (which is also a new setting). If additionally the max size of the ConnectionPool is reached, then an exception is thrown which makes this a breaking change.

This change fixes both issues together as discussed in PR #903.

All tests pass with `./docker/build.sh -t -i` and I verified that the changed docs look as expected with `./docker/build.sh -d`.

VOTE +1

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Oh yes, good that you found this.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on issue #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "spmallette (GitHub)" <gi...@apache.org>.
ok - i have to start with 3.2.11 first so it will be a bit before i get to 3.4.0. you have some time to finish up. 

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on issue #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
The behaviour that motivated TINKERPOP-1774 was the unbounded number of connections that was being created, but it's true that references the java driver as the reference implementation, sorry about that.

I think having a fixed amount of connections leads to a predictable behaviour (latency / resource provisioning / etc) and a simpler implementation. With request pipelining, there are few cases where the user will need more than 1 connection (in that case, the user can configure a different fixed number).

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on issue #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Thanks for your comments, @jorgebay. I just pushed an updated version that uses a fixed size for the connection pool and that should also address the other issues you pointed out. The Travis build should now also succeed deterministically.

VOTE +1

edit: I had to remove one test that failed on Travis like 1/10 times. It tried to ensure that a slow request (in the form of a simple `sleep()`) doesn't block another fast request. That failed sometimes, probably because all worker threads of the Gremlin Server were already blocked from other tests that also used these sleeps. I can't think of another way to test this, so I simply removed the test.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
Suggestion: to simplify the implementation you can create an async method:

```C#
private async Task SendNextMessageFromQueue() {}
```

And use `await` for your dequeue-send flow. If you catch your exceptions, its safe to start the call for `SendNextMessageFromQueue()` as "fire and forget". The end result will be the same (as `ContinueWith()` does exactly that) but the code will be simpler.
You can look at a `Forget()` implementation here: https://github.com/datastax/csharp-driver/blob/46e8da823512f12cc4f109e6812ff611a9f14e20/src/Cassandra/Tasks/TaskHelper.cs#L380-L387

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
I think something like `t.Exception.Handle(_ => true);` is enough, the idea is to avoid triggering [`UnobservedTaskException `](https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler.unobservedtaskexception) that can be a pain for some users:

```c#
if (t.IsFaulted)
{
    t.Exception.Handle(_ => true);
    // Use the original exception here
    _callbackByRequestId[msg.RequestId].HandleFailure(t.Exception.InnerException);
    // ...
}
```

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
You mean that `_graphSONReader.ToObject()` shouldn't return `dynamic`?

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Good idea! I made a similar change like for the send part. All exceptions are caught (exceptions where the request ID is known are already caught at a lower level to only let that request fail) and the same failure handle function is called like for the send part.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
I think this loop (recursion) should be extracted out of this method, otherwise the first time the CAS will occur but the following sendings will never win the CAS op (`_writeInProgress` will already be `1`) .

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
Suggestion: to simplify the implementation you can create an async method:

```C#
private async Task SendNextMessageFromQueue() {}
```

And use `await` for your dequeue-send flow. If you catch your exceptions, its safe to start the initial call for `SendNextMessageFromQueue()` (from `BeginSendingMessages()`) as "fire and forget". The end result will be the same (as `ContinueWith()` does exactly that) but the code will be simpler.
You can look at a `Forget()` implementation here: https://github.com/datastax/csharp-driver/blob/46e8da823512f12cc4f109e6812ff611a9f14e20/src/Cassandra/Tasks/TaskHelper.cs#L380-L387

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Good catch, I think that was actually the cause of the time outs on Travis.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
I mean converting it to whatever you expect (`IEnumerable`?) to avoid making `_aggregator.Add(d)` and `_result.Add(d)` being dynamic calls.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on issue #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
> now that python issues are resolved I intend to start the release process soon. are you still expecting to merge this for 3.4.0?

Yes, from my side this is ready to be merged. I just wanted to wait for another review and not use lazy consensus as the change isn't exactly trivial and also because of the holidays where probably not so many contributors had the chance to review this. However, if you want to start the release process, then this PR shouldn't stop it in my opinion. So, I can merge this tomorrow morning or later today.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
The mix of async code (`ContinueWith()` on `SendNextMessageFromQueue()`) and sync code will cause a race condition, we should only change `_writeInProgress` to `0` when we finish dequeueing all items.
The easiest solution would be to move this set statement to the `SendNextMessageFromQueue()` method.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
It would be nice to remove the dynamic call here, i.e., cast to `object`.

That way the failure would be far more simple to understand if its static code.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann closed pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
[ pull request closed by FlorianHockmann ]

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
I think we can do something similar to the send portion.
Otherwise, faulted tasks at this level will be unhandled. We should invoke all the pending callbacks if there is a single receiving failure (network related).

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on issue #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Thanks for your comments, @jorgebay. I just pushed an updated version that uses a fixed size for the connection pool and that should also address the other issues you pointed out. The Travis build should now also succeed deterministically.

VOTE +1

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
OK, then I really misunderstood your suggestion. I tried your suggestion, but unfortunately it wasn't that easy as it would require a change to this line, too:
```cs
_result.Add(d);
```
which doesn't work if `d` is of type `object`. We would need a cast to `T` here, but that would require more handling in case that `d` is not of type `T`.
So, I prefer not to change this now and instead improve type safety of the deserialization in general some time later.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on issue #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "spmallette (GitHub)" <gi...@apache.org>.
@FlorianHockmann now that python issues are resolved I intend to start the release process soon.  are you still expecting to merge this for 3.4.0? 

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on issue #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
> I would recommend to remove the dynamic pool sizes (min/max), but I leave it up to you. In my experience, the use for dynamic pools is very limited. Usually when there is spike in requests, opening new connections can lead to a latency with no added benefit (there is a limited number of requests that a server can process at any given time).

You mean that the pool should simply have a fixed size instead? Wasn't that what you asked for in TINKERPOP-1774:

> Similar to the java connection pool, we should limit the maximum amount of connections and start with a minimum number.

or what do you mean exactly with dynamic pool sizes?

I don't really have a strong opinion on that. My assumptions was just that it's probably hard for users to set a good value for the size. Dynamic pool sizes basically avoid this as the pool will automatically grow to a size that is big enough to handle the number of parallel requests the application typically uses. Later on, we could also add a mechanism to remove idle connections if they were only created because of a spike in requests as too many idle connection seem to be problematic with certain providers (especially CosmosDB, judging from the tickets we currently have open). Although this mechanism would of course make the latency problem even worse for spikes when the connections were already closed again since the last spike.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
We should also use `TrySetResult()` here, as we might race between an exception and the result (which is OK).

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
I think that's a good idea in general, but it would require to change a few things about the deserialization in general. Here, we really expect an `IEnumerable` and should always get one based on the format of the data we're getting from the server, but the same method is called in other places for other formats. In some places, an `int` is expected as the return type for example. This isn't really nice and a bit confusing which is why we should probably clean this up, but I would rather do that with a separate PR as it's really not related to this PR.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
That definitely made the code easier to read :-)
I copied the method directly from there. I didn't add a copyright notice because I assume that this is already handled by your CLA. Is that correct or should I add a copyright header for DataStax somewhere to the `Util.cs` file?

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
There should be no concern regarding license, it's Apache licensed, I'm the author and I'm committer here :)

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Oh yes, kind of obvious. I just pushed a commit with this change.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
> AggregateException.Handle() should be called here, as the faulted task is not going to be yielded to the user.

You mean simply like this (+ the logic from your second remark here)?

```cs
t.Exception.Handle(e =>
    {
        _callbackByRequestId[msg.RequestId].HandleFailure(t.Exception);
        _callbackByRequestId.TryRemove(msg.RequestId, out _);
        return true;
    });
```

> Additionally, I think if the sending fails, the whole connection should be "trashed" (empty queues, respond with the errors back to the user) and close the connection.

That sounds like a good idea. I guess we have to throw the exception then for every request for which we have a callback registered at that time.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
To clarify, I meant using something like this:

```c#
IEnumerable receivedData = typeof(T) == typeof(JToken)
    ? new[] {received.Result.Data}
    : (IEnumerable) _graphSONReader.ToObject(received.Result.Data);
```

To avoid making the calls from below "dynamic".

In any case, it's a suggestion, it's ok otherwise.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: Add request pipelining and a fixed ConnectionPool size TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
NIT, we can avoid the second read access to the dictionary by using:

```C#
if (_callbackByRequestId.TryRemove(receivedMsg.RequestId, out var responseHandler))
{
    responseHandler.HandleFailure(e);
}
```

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
This check is nice to have, but given that the connection might switch to closed immediately after this call, I think there is no point in doing a CAS operation here.
A `Volatile.Read()` would do it and it's order of magnitude cheaper :) 

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on issue #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Since I don't really have a strong opinion on this and because I see your points about simpler code and predictable latencies, I'll change this to use a fixed size unless someone else chimes in and explains why we should use dynamic sizes.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] FlorianHockmann commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "FlorianHockmann (GitHub)" <gi...@apache.org>.
Ah, didn't know about this. I'll add this line like you suggested.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
I like this strict behaviour for throttling.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on pull request #1016: WIP: Add request pipelining and ConnectionPool sizes TINKERPOP-1775 and TINKERPOP-1774

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
`AggregateException.Handle()` should be called here, as the faulted task is not going to be yielded to the user.

Additionally, I think if the sending fails, the whole connection should be "trashed" (empty queues, respond with the errors back to the user) and close the connection.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1016 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org