You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2018/02/24 22:42:37 UTC

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9446


Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................

IMPALA-6554: Fix a race in DequeueDeferredRpc()

Previously, KrpcDataStreamRecvr::DequeueDeferredRpc() will call
RespondAndReleaseRpc() outside the of sender queue's lock. Another
thread can race ahead and call KrpcDataStreamRecvr::Close() before
MemTracker::Release() is called on the receiver's MemTracker. This
may lead to update to the receiver's MemTracker after it has been
closed. This change fixes the problem by updating the MemTracker
before dropping the lock in DequeueDeferredRpc(). This change also
changes RespondAndReleaseRpc() to update the MemTracker first before
responding to the RPC. The order doesn't really matter much because
the response of an RPC is handled asynchronously by reactor threads so
there is always a window between when the MemTracker is updated and when
the actual RPC payload is freed. The order is updated so it's consistent
with that of DequeueDeferredRpc().

Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
---
M be/src/rpc/impala-service-pool.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-recvr.cc
3 files changed, 9 insertions(+), 7 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/9446/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9446/2/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9446/2/be/src/runtime/krpc-data-stream-recvr.h@176
PS2, Line 176: exchange node
this receiver


http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc@341
PS3, Line 341: RespondAndReleaseDeferredRpc
Why have this? Why can't we just use KrpcDataStreamMgr::RespondAndReleaseRpc() ? We have all the necessary params in 'TransmitDataCtx'.

Or we could still keep this and call KrpcDataStreamMgr<TransmitDataCtx>::RespondAndReleaseRpc() instead of duplicating the code.


http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc@441
PS3, Line 441:     // Release to MemTracker while still holding the lock to prevent race with Close().
             :     recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             : 
             :   // Responds to the sender to ack the insertion of the row batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
We have a dedicated function to do this, so it's not ideal to duplicate this code here. Another option is to pass a 'bool should_release_mem_tracker' or something. But that's a bit shabby too.

I'm fine with either if we can't think of something better. Just wanted to consider another option.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 00:04:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 5:

Agreed. However, that probably require holding lock to make sure reading both the service queue's size and the mem-tracker is atomic.


-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 23:38:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc@231
PS4, Line 231: Deferred
in this case, the "deferred" doesn't make sense since it hasn't yet been differed, right? Since the mem_tracker is no longer baked in and we will use this for cases other than deferred I think we should not include that in this version.


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc@287
PS4, Line 287: Deferred
this case also not deferred


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-recvr.h@175
PS4, Line 175: mem_tracker_
I think it'd be clearer to also rename this field to deferred_rpc_tracker_ or similar, especially now that it's used as an argument to the shared routine that will do the responding.


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/service/data-stream-service.h
File be/src/service/data-stream-service.h:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/service/data-stream-service.h@62
PS4, Line 62: Deferred
as mentioned earlier, since this is used for non-deferred case as well, let's not put deferred given the new refactoring. and remove "deferred" from comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 17:39:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@438
PS1, Line 438:     recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             : 
             :   // Responds to the sender to ack the insertion of the row batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
> Not really. RespondSuccess() only enqueues a task for the reactor thread to
Okay, I was thinking the sidecars were freed sooner but I guess not.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2018 20:43:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................

IMPALA-6554: Fix a race in DequeueDeferredRpc()

Previously, KrpcDataStreamRecvr::DequeueDeferredRpc() will call
RespondAndReleaseRpc() outside the of sender queue's lock. Another
thread can race ahead and call KrpcDataStreamRecvr::Close() before
MemTracker::Release() is called on the receiver's MemTracker. This
may lead to update to the receiver's MemTracker after it has been
closed. This change fixes the problem by updating the MemTracker
before dropping the lock in DequeueDeferredRpc(). This change also
changes RespondAndReleaseRpc() to update the MemTracker first before
responding to the RPC. The order doesn't really matter much because
the response of an RPC is handled asynchronously by reactor threads so
there is always a window between when the MemTracker is updated and when
the actual RPC payload is freed. The order is updated so it's consistent
with that of DequeueDeferredRpc().

Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Reviewed-on: http://gerrit.cloudera.org:8080/9446
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/rpc/impala-service-pool.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
7 files changed, 60 insertions(+), 47 deletions(-)

Approvals:
  Dan Hecht: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@337
PS1, Line 337: RespondAndReleaseRpc
so in order for this to be correct w.r.t. Close(), the lock must be held while calling it, right?  Let's update the function comment.

Also, this function seems badly named. AFAICT, recvr_->mem_tracker() is only used to track deferred RPCs, and so this should only be called when responding to a deferred RPC, correct?  If so, how about we call it RespondAndReleaseDeferredRpc or something?

Also, should mem_tracker_ be labeled to indicate it is only for deferred RPCs?  Once the RPC is deseralized, it's tracked by the BP client's mem-tracker directly, not the recvr_->mem_tracker_ right?


http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@438
PS1, Line 438:     recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             : 
             :   // Responds to the sender to ack the insertion of the row batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
given that the other callsites are holding the lock, why is it not okay to do so here.

RespondSuccess() is where the payload is freed, right?  So not holding the lock means that Close() returns while the recvr resources are not yet released, right? That seems like it might cause RM trouble for us when we actually bring this under reservations.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2018 18:17:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9446/2/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9446/2/be/src/runtime/krpc-data-stream-recvr.h@176
PS2, Line 176: exchange node
> this receiver
Done


http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc@341
PS3, Line 341: RespondAndReleaseDeferredRpc
> Why have this? Why can't we just use KrpcDataStreamMgr::RespondAndReleaseRp
Good point. I refactored the code and moved it to DataStreamService instead.


http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc@441
PS3, Line 441:     // Release to MemTracker while still holding the lock to prevent race with Close().
             :     recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             : 
             :   // Responds to the sender to ack the insertion of the row batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
> We have a dedicated function to do this, so it's not ideal to duplicate thi
I prefer not to enqueue the reply under the lock if possible. Please see replies to previous comments for more details. Passing a bool seems more complicated IMHO. It's just two lines so don't think it's too bad to duplicate them here.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 06:12:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 02 Mar 2018 03:26:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9446

to look at the new patch set (#5).

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................

IMPALA-6554: Fix a race in DequeueDeferredRpc()

Previously, KrpcDataStreamRecvr::DequeueDeferredRpc() will call
RespondAndReleaseRpc() outside the of sender queue's lock. Another
thread can race ahead and call KrpcDataStreamRecvr::Close() before
MemTracker::Release() is called on the receiver's MemTracker. This
may lead to update to the receiver's MemTracker after it has been
closed. This change fixes the problem by updating the MemTracker
before dropping the lock in DequeueDeferredRpc(). This change also
changes RespondAndReleaseRpc() to update the MemTracker first before
responding to the RPC. The order doesn't really matter much because
the response of an RPC is handled asynchronously by reactor threads so
there is always a window between when the MemTracker is updated and when
the actual RPC payload is freed. The order is updated so it's consistent
with that of DequeueDeferredRpc().

Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
---
M be/src/rpc/impala-service-pool.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
7 files changed, 60 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/9446/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9446

to look at the new patch set (#4).

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................

IMPALA-6554: Fix a race in DequeueDeferredRpc()

Previously, KrpcDataStreamRecvr::DequeueDeferredRpc() will call
RespondAndReleaseRpc() outside the of sender queue's lock. Another
thread can race ahead and call KrpcDataStreamRecvr::Close() before
MemTracker::Release() is called on the receiver's MemTracker. This
may lead to update to the receiver's MemTracker after it has been
closed. This change fixes the problem by updating the MemTracker
before dropping the lock in DequeueDeferredRpc(). This change also
changes RespondAndReleaseRpc() to update the MemTracker first before
responding to the RPC. The order doesn't really matter much because
the response of an RPC is handled asynchronously by reactor threads so
there is always a window between when the MemTracker is updated and when
the actual RPC payload is freed. The order is updated so it's consistent
with that of DequeueDeferredRpc().

Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
---
M be/src/rpc/impala-service-pool.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-mgr.h
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/service/data-stream-service.cc
M be/src/service/data-stream-service.h
7 files changed, 54 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/9446/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@337
PS1, Line 337: RespondAndReleaseRpc
> Good idea. Will update the patch.
PS2 updated the label of mem_tracker_ to "DeferredRPC". The new patch passes the exchange node's MemTracker when calling RowBatch::FromProtobuf() as this corresponds to the MemTracker of the bufferpool client.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 19:27:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(2 comments)

Responding to questions first. Will update the patch next.

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@337
PS1, Line 337: RespondAndReleaseRpc
> so in order for this to be correct w.r.t. Close(), the lock must be held wh
Good idea. Will update the patch.


http://gerrit.cloudera.org:8080/#/c/9446/1/be/src/runtime/krpc-data-stream-recvr.cc@438
PS1, Line 438:     recvr_->mem_tracker()->Release(ctx->rpc_context->GetTransferSize());
             :   }
             : 
             :   // Responds to the sender to ack the insertion of the row batches.
             :   status.ToProto(ctx->response->mutable_status());
             :   ctx->rpc_context->RespondSuccess();
> given that the other callsites are holding the lock, why is it not okay to 
Not really. RespondSuccess() only enqueues a task for the reactor thread to respond to an RPC. The Inbound call which owns the actual InboundTransfer buffer is freed only after the reactor thread finishes sending the response (https://github.com/apache/impala/blob/master/be/src/kudu/rpc/connection.cc#L404-L432).

InboundTransfer at certain stages of an RPC is currently untracked memory. For instance, the ReadHandler() called by the reactor thread is responsible for allocating the buffer for holding the entire RPC payload. We currently don't have the hook into the KRPC stack to count this towards a particular query. This definitely needs to be addressed once we get past KRPC milestone 1.

I avoid holding the lock to while the response buffer is being serialized. We call RespondAndReleaseRpc() with lock held but they are mostly for error conditions (e.g. cancelled stream, failure to unpack the RPC sidecars etc). Note that in the fast path AddRow(), we also serialize the response without holding the lock. Not that I have solid figure to prove that it matters a lot but this seems to take long enough that the race this patch is fixing can actually manifest with concurrent queries.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2018 20:27:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 5: Code-Review+2

This looks good.

Not for this change, but do we have any way to test for leaks of the service mem tracker? Given that's daemon wide, we probably don't get the benefit of the Close() check for that mem tracker, right?  Maybe we should add a DCHECK that whenever the service queue is empty, the service mem tracker has 0 consumption?


-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 22:06:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 4: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/3/be/src/runtime/krpc-data-stream-recvr.cc@441
PS3, Line 441:   int sender_id = ctx->request->sender_id();
             :   recvr_->mem_tracker()->Consume(ctx->rpc_context->GetTransferSize());
             :   COUNTER_ADD(recvr_->num_arrived_batches_, 1);
             :   {
             :     lock_guard<SpinLock> l(lock_);
             :     if (UNLIKELY(is_cancelled_)) {
             :       DataStreamService::RespondDefer
> I prefer not to enqueue the reply under the lock if possible. Please see re
Makes sense. Adding a brief comment pointing that out would be a good idea. I can see how a new reader of the code could get confused otherwise.



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 19:36:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2035/


-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 23:41:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9446 )

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc
File be/src/runtime/krpc-data-stream-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc@231
PS4, Line 231: AndRelea
> in this case, the "deferred" doesn't make sense since it hasn't yet been di
Agreed. Fixed in PS5.


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-mgr.cc@287
PS4, Line 287: AndRelea
> this case also not deferred
Done


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-recvr.h
File be/src/runtime/krpc-data-stream-recvr.h:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/runtime/krpc-data-stream-recvr.h@175
PS4, Line 175: deferred_rpc
> I think it'd be clearer to also rename this field to deferred_rpc_tracker_ 
Done


http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/service/data-stream-service.h
File be/src/service/data-stream-service.h:

http://gerrit.cloudera.org:8080/#/c/9446/4/be/src/service/data-stream-service.h@62
PS4, Line 62: AndRelea
> as mentioned earlier, since this is used for non-deferred case as well, let
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Mar 2018 21:53:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6554: Fix a race in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9446

to look at the new patch set (#2).

Change subject: IMPALA-6554: Fix a race in DequeueDeferredRpc()
......................................................................

IMPALA-6554: Fix a race in DequeueDeferredRpc()

Previously, KrpcDataStreamRecvr::DequeueDeferredRpc() will call
RespondAndReleaseRpc() outside the of sender queue's lock. Another
thread can race ahead and call KrpcDataStreamRecvr::Close() before
MemTracker::Release() is called on the receiver's MemTracker. This
may lead to update to the receiver's MemTracker after it has been
closed. This change fixes the problem by updating the MemTracker
before dropping the lock in DequeueDeferredRpc(). This change also
changes RespondAndReleaseRpc() to update the MemTracker first before
responding to the RPC. The order doesn't really matter much because
the response of an RPC is handled asynchronously by reactor threads so
there is always a window between when the MemTracker is updated and when
the actual RPC payload is freed. The order is updated so it's consistent
with that of DequeueDeferredRpc().

Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
---
M be/src/rpc/impala-service-pool.cc
M be/src/runtime/krpc-data-stream-mgr.cc
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
4 files changed, 29 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/46/9446/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9446
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4a3b0789633c7b8bc898381d509e2af769f0e069
Gerrit-Change-Number: 9446
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>