You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2017/08/21 23:34:49 UTC

[kudu-CR] [java] fixed deadlock in client.Connection

Alexey Serbin has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/7765

Change subject: [java] fixed deadlock in client.Connection
......................................................................

[java] fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

This changelist addresses KUDU-1894.

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 11 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG
Commit Message:

Line 13: This changelist addresses KUDU-1894.
> Curious why you specified this here and didn't just prefix the commit summa
That's because KUDU-1894 was about some different issue in the beginning.

From the other side, it's better to stick to current interpretation: KUDU-1894 contains the exact information about  the deadlock this patch fixes.


http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS1, Line 485: Netty's
> Netty (here and below)
Done


PS1, Line 491: if
> of a
Done


Line 573:   @GuardedBy("lock")
> The annotation should be removed, right?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

Line 524:     sendCallToWire(msg);
> Yes, that can happen now.  But I don't see how it could break any assumptio
Even if Channels.write() is async, I think there is an underlying queue in Netty. However, I don't see how we are about to see a problem here in case of concurrent calls after this change.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 5: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> Did you consider trying something like this? I can prototype and see if it 
I did not.  The reason is that I was not sure it would not deadlock in that case, because I was not sure we would have a guarantee that no other event would happen.

I.e. I just wanted to get away from holding the Connection.lock() while doing any call down to the Netty pipelines.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7765

to look at the new patch set (#4).

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................

KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

The idea is to not hold the Connection.lock while invoking the
Connection.sendCallToWire() method and, overall, avoid doing any
calls to Netty while holding that lock.

To test the changes, I ran the ITClient via dist-test test apllying
Todd's WIP patch on top: https://gerrit.cloudera.org/#/c/7579/
The test passed 3572 out of 3572 times:
  http://dist-test.cloudera.org/job?job_id=aserbin.1503526685.24101
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527340.4787
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527848.6921

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 59 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG
Commit Message:

Line 13: This changelist addresses KUDU-1894.
> Curious why you specified this here and didn't just prefix the commit summa
That's because KUDU-1894 was about some different issue in the beginning.

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS4, Line 274:  while (true) {
             :           List<QueuedMessage> queued = Preconditions.checkNotNull(queuedMessages);
             :           if (queued.isEmpty()) {
             :             break;
> even though we'll lose the Preconditions check here, I think this would be 
Done


PS4, Line 290: lock.unlock();
             :           needs_unlock = false;
             :           // Send out the enqueued messages while not holding the lock. This is to avoid
             :           // deadlock if channelDisconnected/channelClosed event happens and cleanup() is called.
             :           for (final QueuedMessage qm : queued) {
             :             sendCallToWire(qm.message);
             :           }
             :           lock.lock();
             :           needs_unlock = true;
> I think it's slightly more idiomatic to do:
I like it better, thanks!


Line 304:         queuedMessages = null;
> can you add an 'assert queuedMessages.empty()' here?
Done


Line 507:       Preconditions.checkState(state == State.READY);
> seems more like an assert to me
Done


Line 524:     sendCallToWire(msg);
> does this now risk that calls come across the wire in non-ascending callId 
Yes, that can happen now.  But I don't see how it could break any assumptions in our code.  Also, since we were not controlling the ordering of the messages at the Netty level (Channels.write() sends the message asynchronously) even before this change, I don't expect anything else to emerge here.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> uh-oh: this needs to be guarded either by lock or the collection should pro
yea, I think the better bet is deferring the cleanup call back onto a new iteration of the netty event loop thread (asynchronously) after the disconnected event, if that's possible, so that the sslhandler lock is no longer held when we try to acquire our own lock?

(man I can't wait for netty 4 upgrade some day)


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7765

to look at the new patch set (#3).

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................

KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 59 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2: Code-Review-1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
uh-oh: this needs to be guarded either by lock or the collection should provide concurrency guarantees.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> I tried using:
Huh, that's interesting.

Maybe then just keep this work-around and put a TODO to resolve this once we transition to Netty 4.x or 5.x?  What exactly is wrong with the current fix?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 5:

> This looks good to me, but obviously this code is extremely
 > fragile.  We should figure out a way to make this more robust down
 > the line.  I took a quick look at the server implementation and it
 > seems like it would be OK with out of order call IDs.

Thank you for the review.

I think there will be a good opportunity to make the code more robust when switching to new version of Netty (4.x or even 5.0).  At least, with Netty 4.x a lot of multi-threading and locking woes should go away because of stronger guarantees on event handling:
 http://netty.io/wiki/new-and-noteworthy-in-4.0.html#well-defined-thread-model

-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

This looks good to me, but obviously this code is extremely fragile.  We should figure out a way to make this more robust down the line.  I took a quick look at the server implementation and it seems like it would be OK with out of order call IDs.

-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> Huh, that's interesting.
Nothing specifically wrong with it that I can see. It just makes the locking a bit more complicated/harder to follow, so was hoping to find a workaround that would allow us to keep our simpler coarse grained locking.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has submitted this change and it was merged.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring of the Connection.lock and some
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

The idea is to not hold the Connection.lock while invoking the
Connection.sendCallToWire() method and, overall, avoid doing any
calls to Netty while holding that lock.

To test the changes, I ran the ITClient test via dist-test apllying
Todd's WIP patch on top: https://gerrit.cloudera.org/#/c/7579/

The test passed 3572 out of 3572 times:
  http://dist-test.cloudera.org/job?job_id=aserbin.1503526685.24101
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527340.4787
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527848.6921

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Reviewed-on: http://gerrit.cloudera.org:8080/7765
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
Reviewed-by: Jean-Daniel Cryans <jd...@apache.org>
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 54 insertions(+), 21 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, but someone else must approve
  Jean-Daniel Cryans: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Jean-Daniel Cryans <jd...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has uploaded a new patch set (#5).

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................

KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring of the Connection.lock and some
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

The idea is to not hold the Connection.lock while invoking the
Connection.sendCallToWire() method and, overall, avoid doing any
calls to Netty while holding that lock.

To test the changes, I ran the ITClient test via dist-test apllying
Todd's WIP patch on top: https://gerrit.cloudera.org/#/c/7579/

The test passed 3572 out of 3572 times:
  http://dist-test.cloudera.org/job?job_id=aserbin.1503526685.24101
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527340.4787
  http://dist-test.cloudera.org/job?job_id=aserbin.1503527848.6921

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 54 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/5
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [java] fixed deadlock in client.Connection

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: [java] fixed deadlock in client.Connection
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/1//COMMIT_MSG
Commit Message:

Line 13: This changelist addresses KUDU-1894.
Curious why you specified this here and didn't just prefix the commit summary with the bug reference?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> yea, I think the better bet is deferring the cleanup call back onto a new i
Did you consider trying something like this? I can prototype and see if it fixes the issue.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/2/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS2, Line 578: inflightMessages.put
> Did you consider trying something like this? I can prototype and see if it 
I tried using:

    channel.getPipeline().execute(new Runnable() {
        @Override
        public void run() {
          cleanup("connection closed");
        }
      });

to try to defer the cleanup() call to the next loop of the IO thread... but it didn't work, because execute() tries to be smart and execute inline if the submitting thread is the IO thread itself.

It's funny because it seems there was some patch internal to SslHandler to fix https://github.com/netty/netty/issues/989#issuecomment-14854044 which tried to do the same thing, and then the commenter at the end points out that it shouldn't do anything due to exactly the above smartness.

So, we're back at square one and I guess something like your fix is required. Or some other way than the above to defer the cleanup() call out of the synchronous path back onto a different thread.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS4, Line 274:  while (true) {
             :           List<QueuedMessage> queued = Preconditions.checkNotNull(queuedMessages);
             :           if (queued.isEmpty()) {
             :             break;
even though we'll lose the Preconditions check here, I think this would be easier to follow with just a : while (!queuedMessages.empty()) { ...}

in terms of the exception, it'll throw an NPE either way, so I dont think it's much of a loss.


PS4, Line 290: lock.unlock();
             :           needs_unlock = false;
             :           // Send out the enqueued messages while not holding the lock. This is to avoid
             :           // deadlock if channelDisconnected/channelClosed event happens and cleanup() is called.
             :           for (final QueuedMessage qm : queued) {
             :             sendCallToWire(qm.message);
             :           }
             :           lock.lock();
             :           needs_unlock = true;
I think it's slightly more idiomatic to do:

lock.unlock();
try {
} finally {
  lock.lock();
}

for the 'unlocked' periodic here, even though in some rare circumstance in which there is an exception thrown, it would cause an extra lock/unlock


Line 304:         queuedMessages = null;
can you add an 'assert queuedMessages.empty()' here?


Line 507:       Preconditions.checkState(state == State.READY);
seems more like an assert to me


Line 524:     sendCallToWire(msg);
does this now risk that calls come across the wire in non-ascending callId order? Does that break any assumptions elsewhere in the code?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] [java] fixed deadlock in client.Connection

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: [java] fixed deadlock in client.Connection
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/7765/1/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

PS1, Line 485: Netty's
Netty (here and below)


PS1, Line 491: if
of a


Line 573:   @GuardedBy("lock")
The annotation should be removed, right?


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

Line 524:     sendCallToWire(msg);
> Even if Channels.write() is async, I think there is an underlying queue in 
Another thought is that the sequence anyway could go non-ascending order when the call_id counter overflows.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 5: Code-Review+1

Sounds good.

-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

Line 524:     sendCallToWire(msg);
> yea but call_id is an int64 so it would require a single TCP connection to 
As far as I could see, call_id is int32_t, so I would expect it would rotate much faster :)

Yes, sure -- I agree it's better to be careful here.  I took a look at the code and I didn't find any place where non-ascending call_id (or sparse sequence of those) could break the code.  However, I might be missing something -- it's better to be safe than sorry.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7765/4/java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
File java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java:

Line 524:     sendCallToWire(msg);
> Another thought is that the sequence anyway could go non-ascending order wh
yea but call_id is an int64 so it would require a single TCP connection to last for a few millennia for that to happen :) Let me take some time to ponder this patch a bit more.


-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................


Patch Set 4: Verified+1

unrelated flakes in

    MiniTabletServerTest.TestMultiDirServer
    KuduTsCliTest.TestDeleteTablet
    KuduTsCliTest.TestDumpTablet

-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1894 fixed deadlock in client.Connection

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has uploaded a new patch set (#2).

Change subject: KUDU-1894 fixed deadlock in client.Connection
......................................................................

KUDU-1894 fixed deadlock in client.Connection

Due to the reverse order of acquiring the Connection.lock and
lower-level Netty locks, Connection.enqueueMesage() could deadlock if
a ChannelDisconnected/ChannelClosed event arrived in the middle.

Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
---
M java/kudu-client/src/main/java/org/apache/kudu/client/Connection.java
1 file changed, 11 insertions(+), 7 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/65/7765/2
-- 
To view, visit http://gerrit.cloudera.org:8080/7765
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1a0edc3e3accbcff60f2cde641ee470312bbd27a
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>