You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "divijvaidya (GitHub)" <gi...@apache.org> on 2019/03/08 21:38:53 UTC

[GitHub] [tinkerpop] divijvaidya opened pull request #1081: Tinkerpop-2169:Responses exceeding maxContentLength cause subsequent queries to hang

https://issues.apache.org/jira/browse/TINKERPOP-2169
https://issues.apache.org/jira/browse/TINKERPOP-2173

**Problem**
1. On receiving a CorruptedFrameException, the current code does not decrement the borrowed count on the connection and marks the connection for destruction. However, the connection never gets destroyed since the it can only be destroyed when the borrowed count is 0. This leads to unnecessary connections waiting in the bin to be cleared.
2. If a connection is serving multiple requests and one of the requests gets a CorruptedFrameException, Netty closes the underlying channel and thus, other requests on the same channel receive a ChannelClosed exception. The current code marks the server host as unavailable (thus closing out other connections for the host as well) on a ChannelClosed exception whereas that is not necessarily true. Marking the host as unavailable is too aggressive in this scenario.

**Changes**
1. Connection.isDead() logic now consists of checking the underlying channel.
2. If a connection is dead, destroy the connection, irrespective of the number of borrowed items. Informing the inflight (pending) request futures about this is already done by the response handler.
3.  If a connection is returned to the pool and it is dead, replace the connection. Do not consider the host as unavailable since a single connection dead does not imply a dead host.
4.  Add a warning log whenever a host is marked as unavailable.
5.  Replace the connection on ClosedChannel exception.
6. Reset the logging level in the integration tests after each test execution.

**Testing**

- Added a test to reproduce the leak (Problem#1). The test fails before the fixes and passes after the fixes.
- `mvn verify -DskipIntegrationTests=false`  & `mvn verify install` is successful for gremlin-server & gremlin-driver


[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] divijvaidya commented on pull request #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "divijvaidya (GitHub)" <gi...@apache.org>.
Yes, instead of relying on specific exceptions to determine when to replace a channel vs. returning it to the pool, we explicitly check the channel itself for health status. If the channel is active, it will be returned to the pool, otherwise, it will be replaced.

Netty closes the channel on IOExceptions and certain other non-recoverable exceptions. The connection will be replaced as per the new logic which is the same behavior as earlier.

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
Everything about this pull request was well done - from the JIRA creation to analysis of the problem/solutions, to the actual code/tests. Thanks!

VOTE +1

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] divijvaidya commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "divijvaidya (GitHub)" <gi...@apache.org>.
1. As per developer documentation, _"Once either consensus position is reached, you are responsible for merging to the release branch and handling any merge conflicts."_
How do I merge the pull requests? Seems like I don't have write access to the repo.
2. As recommended by the developer documentation, I have created a separate pull request for the tp33 branches at https://github.com/apache/tinkerpop/pull/1082. Can you please approve that as well?
3. I don't have permissions to modify the JIRA https://issues.apache.org/jira/browse/TINKERPOP-2169. Please mark it as resolved with the correct affected versions. 



[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
I've merged #1082 to `tp33` and then merged `tp33` to `master` rather than merging this PR directly to master as that produces the git history we'd like to have. I oddly had to do this:

https://github.com/apache/tinkerpop/commit/0601c5ba593474535676b77ab9785956e6dcf6da

because tests were failing. I say "oddly" because Travis was fine with what what was there before and obviously it worked in your environment. So, I'm concluding it is something environmental in some way. If there are other thoughts on the right way to handle that, please let us know.  Also note that I added some CHANGELOG entries:

https://github.com/apache/tinkerpop/commit/46e6a4da3040a006e50e821678791b42c00ac5ea

I'm going to close this PR and the related issues in JIRA. @divijvaidya thanks again for a nice PR - looking forward to further contributions from you. 

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette closed pull request #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
[ pull request closed by spmallette ]

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] jorgebay commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "jorgebay (GitHub)" <gi...@apache.org>.
A committer will merge your pull requests and move the ticket to resolved (probably today).
As `tp33` is still being merged to `master` regularly, probably we will merge #1082 and then merge `tp33` into `master` (I don't think we need votes for merging #1082 that is identical to this one).

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on pull request #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
This is a better exception than `ClosedChannelException` - cool

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on pull request #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
i like that you got rid of the `isDead` boolean.  nice

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on pull request #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
So, just to align my understanding - you don't check 

```java
if (t instanceof IOException || t instanceof CodecException) {
``` 

anymore to replace the connection and rather just check `isDead()` because `isDead()` now encompasses situations where those two exceptions would be raised?  

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] dkuppitz commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "dkuppitz (GitHub)" <gi...@apache.org>.
VOTE +1

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org

[GitHub] [tinkerpop] spmallette commented on issue #1081: TINKERPOP-2169/2173 Responses exceeding maxContentLength cause subsequent queries to hang

Posted by "spmallette (GitHub)" <gi...@apache.org>.
I knew I'd made that change before somewhere for the error message and it was in my Java 11 branch:

https://github.com/apache/tinkerpop/commit/377bc2f1babe63b2c07c73a71c9c0e665121212a

So I assume, the difference has something to do with the JDK version/type being used. 

[ Full content available at: https://github.com/apache/tinkerpop/pull/1081 ]
This message was relayed via gitbox.apache.org for dev@tinkerpop.apache.org