You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Michael Ho (Code Review)" <ge...@cloudera.org> on 2018/02/24 01:20:47 UTC

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9439


Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................

IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

The local variable 'status' was mistakenly defined twice in
DequeueDeferredRpc(): one in the outer scope and one in the
inner scope. This causes the error status of AddBatchWork()
to be dropped when the inner scope ends. As a result, the error
status from AddBatchWork() (e.g. MemLimitExceeded) will not be
propagated back to the sender aand the receiver will continue
to operate with some missing data, leading to wrong query
results.

This change fixes the problem by removing the redefinition of
the status local variable. It also adds some counters in the
profile to make diagnostics of failed RPCs or missing EOS easier.

Testing done: Stress test consistently reproduced the problem before.
With this fix, no wrong results have been seen after 2 iterations
of stress tests, which translates to about 20000 queries being run.

Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
---
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
4 files changed, 32 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9439/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................

IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

This change fixes a couple of issues found with stress test
(concurrent_select.py)

1. The local variable 'status' was mistakenly defined twice in
DequeueDeferredRpc(): one in the outer scope and one in the
inner scope. This causes the error status of AddBatchWork()
to be dropped when the inner scope ends. As a result, the error
status from AddBatchWork() (e.g. MemLimitExceeded) will not be
propagated back to the sender and the receiver will continue
to operate with some missing data, leading to wrong query
results. This change fixes the problem by removing the redefinition
of the status local variable. It also adds some counters in the
profile to make diagnostics of failed RPCs or missing EOS easier.

2. AddBatchWork() may return early without enqueuing a row
batch if it encounters a failure creating a row batch. The bug
is that it may return early without notifying threads waiting
on 'data_arrival_cv_', causing threads waiting for
'num_pending_enqueue_' to 0 to wait indefinitely. This may
cause some fragment instances to stick around after the query
has been cancelled.

3. This one is not mainfesting during stress test and it's benign.
There is a missing check for 'is_cancelled_' in TakeOverEarlySenders()
before enqueuing an early sender into the deferred_rpc_ queue.
The bug is benign because TakeOverEarlySenders() is called from
in fragment execution thread context so it cannot call close the
receiver and call CancelStream() on the receiver until it returns
from CreateRecvr() unless the query is cancelled. In which case,
the sender should also get the cancellation. That said, it should
be fixed. This change checks for the 'is_cancelled_' flag before
enqueuing an early sender into the 'deferred_rpc_' queue.

Testing done: Stress test consistently reproduced the problems before.
With this fix, no wrong results have been seen after 2 iterations
of stress tests, which translates to about 20000 queries being run.

Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Reviewed-on: http://gerrit.cloudera.org:8080/9439
Reviewed-by: Michael Ho <kw...@cloudera.com>
Tested-by: Impala Public Jenkins
---
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
4 files changed, 53 insertions(+), 11 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 19:13:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2024/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 20:06:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Sailesh Mukil (Code Review)" <ge...@cloudera.org>.
Sailesh Mukil has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9439/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9439/1//COMMIT_MSG@14
PS1, Line 14: aand
nit: typo


http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc@309
PS1, Line 309: if (UNLIKELY(!status.ok())) COUNTER_ADD(parent_->rpc_failure_counter_, 1);
Does this mean a RPC failed? Couldn't a bad RPC response also end up incrementing this counter?

Eg: L501 doesn't mean that the RPC itself failed, but that the RPC hit some error on the remote side.

Is this to include both? If yes, let's make that clear, else it might be confusing as to what exactly this counter is counting.


http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc@627
PS1, Line 627: eos_sent_counter_
What's the use case for counting this? In the good case this should be equal to number of channels right?

Is it to compare with the number of EOS RPCs received on the receiver side?



-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 02:17:54 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9439

to look at the new patch set (#4).

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................

IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

This change fixes a couple of issues found with stress test
(concurrent_select.py)

1. The local variable 'status' was mistakenly defined twice in
DequeueDeferredRpc(): one in the outer scope and one in the
inner scope. This causes the error status of AddBatchWork()
to be dropped when the inner scope ends. As a result, the error
status from AddBatchWork() (e.g. MemLimitExceeded) will not be
propagated back to the sender and the receiver will continue
to operate with some missing data, leading to wrong query
results. This change fixes the problem by removing the redefinition
of the status local variable. It also adds some counters in the
profile to make diagnostics of failed RPCs or missing EOS easier.

2. AddBatchWork() may return early without enqueuing a row
batch if it encounters a failure creating a row batch. The bug
is that it may return early without notifying threads waiting
on 'data_arrival_cv_', causing threads waiting for
'num_pending_enqueue_' to 0 to wait indefinitely. This may
cause some fragment instances to stick around after the query
has been cancelled.

3. This one is not mainfesting during stress test and it's benign.
There is a missing check for 'is_cancelled_' in TakeOverEarlySenders()
before enqueuing an early sender into the deferred_rpc_ queue.
The bug is benign because TakeOverEarlySenders() is called from
in fragment execution thread context so it cannot call close the
receiver and call CancelStream() on the receiver until it returns
from CreateRecvr() unless the query is cancelled. In which case,
the sender should also get the cancellation. That said, it should
be fixed. This change checks for the 'is_cancelled_' flag before
enqueuing an early sender into the 'deferred_rpc_' queue.

Testing done: Stress test consistently reproduced the problems before.
With this fix, no wrong results have been seen after 2 iterations
of stress tests, which translates to about 20000 queries being run.

Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
---
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
4 files changed, 53 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9439/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 5: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Feb 2018 01:17:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Feb 2018 17:24:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9439/2/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9439/2/be/src/runtime/krpc-data-stream-recvr.cc@346
PS2, Line 346:   // TODO: Add timers for time spent in this function and queue time in 'batch_queue_'.
> is there a counter to determine how many batches made it to this point? If 
Added a new counter called NumBatchesArrived. So a batch goes from "Arrived" (->"Deferred") -> "Received" -> "Enqueued"


http://gerrit.cloudera.org:8080/#/c/9439/2/be/src/runtime/krpc-data-stream-recvr.cc@446
PS2, Line 446:     lock_guard<SpinLock> l(lock_);
There seems to be a missing check for the flag "is_cancelled_" here. Fixed in PS3. This may be benign though as this function is called from CreateRecvr() from the main execution thread so it really cannot call Cancel() until after returning from CreateRecvr().

If the receiver is cancelled due to query cancellation, the sender should also get a cancellation too so it won't be waiting for a reply. That said, it's still nice to fix this to maintain the invariant that the deferred_rpcs_ queue is empty in Close().



-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 18:41:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2026/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 20:12:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, Dan Hecht, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9439

to look at the new patch set (#3).

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................

IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

The local variable 'status' was mistakenly defined twice in
DequeueDeferredRpc(): one in the outer scope and one in the
inner scope. This causes the error status of AddBatchWork()
to be dropped when the inner scope ends. As a result, the error
status from AddBatchWork() (e.g. MemLimitExceeded) will not be
propagated back to the sender and the receiver will continue
to operate with some missing data, leading to wrong query
results.

This change fixes the problem by removing the redefinition of
the status local variable. It also adds some counters in the
profile to make diagnostics of failed RPCs or missing EOS easier.

This change also fixes a missing check for 'is_cancelled_' flag
in TakeOverEarlySenders() before enqueuing an early sender into
the deferred_rpc_ queue. The missing check should be benign
because KrpcDataStreamRecvr::SenderQueue::TakeOverEarlySenders()
is called from KrpcDataStreamMgr::CreateRecvr() in the fragment
execution thread context so KrpcDataStreamRecvr::CancelStream()
can only be called due to query cancellation while CreateRecvr()
is still running. In which case, the sender should also be cancelled.
That said, this should be fixed by checking for the 'is_cancelled_'
flag before enqueuing an early sender into the 'deferred_rpc_' queue.

Testing done: Stress test consistently reproduced the problem before.
With this fix, no wrong results have been seen after 2 iterations
of stress tests, which translates to about 20000 queries being run.

Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
---
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
4 files changed, 52 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9439/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 1:

Pushing this fix out to unbreak the build for now. Will follow up with a separate change to implement a targeted test.


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 01:21:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Dan Hecht (Code Review)" <ge...@cloudera.org>.
Dan Hecht has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

Please let Sailesh finish his review too, of course.

http://gerrit.cloudera.org:8080/#/c/9439/2/be/src/runtime/krpc-data-stream-recvr.cc
File be/src/runtime/krpc-data-stream-recvr.cc:

http://gerrit.cloudera.org:8080/#/c/9439/2/be/src/runtime/krpc-data-stream-recvr.cc@346
PS2, Line 346:   // TODO: Add timers for time spent in this function and queue time in 'batch_queue_'.
is there a counter to determine how many batches made it to this point? If not, do you think that would be helpful, given the deferred batch path is non-trivial, and that there are error paths that can cause us to not get to AddBatchWork()?



-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 05:04:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2027/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 21:27:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2019/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 04:46:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2019/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 09:48:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2024/


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 16:23:20 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 5: Code-Review+2

Carry Dan's +2.


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 21:25:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

More infrastructure issue. IMPALA-6394. HDFS block replication failed.


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 20:11:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix some bugs in KprcDataStreamRecvr

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix some bugs in KprcDataStreamRecvr
......................................................................


Patch Set 4:

GVO failed due to flaky infrastructure (IMPALA-6593)


-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Feb 2018 16:22:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/9439 )

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9439/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9439/1//COMMIT_MSG@14
PS1, Line 14: aand
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc@309
PS1, Line 309: if (UNLIKELY(!status.ok())) COUNTER_ADD(parent_->rpc_failure_counter_, 1);
> Does this mean a RPC failed? Couldn't a bad RPC response also end up increm
Yes, this is the sum of RPC failure and number of times remote replies with a non-retryable error. Comments updated. This helps identify if the error from a receiver propagates correctly to the sender.


http://gerrit.cloudera.org:8080/#/c/9439/1/be/src/runtime/krpc-data-stream-sender.cc@627
PS1, Line 627: eos_sent_counter_
> What's the use case for counting this? In the good case this should be equa
To pinpoint senders which didn't send EOS, leading to hung receivers.



-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 24 Feb 2018 02:54:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

Posted by "Michael Ho (Code Review)" <ge...@cloudera.org>.
Hello Sailesh Mukil, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9439

to look at the new patch set (#2).

Change subject: IMPALA-6565: Fix dropped status in DequeueDeferredRpc()
......................................................................

IMPALA-6565: Fix dropped status in DequeueDeferredRpc()

The local variable 'status' was mistakenly defined twice in
DequeueDeferredRpc(): one in the outer scope and one in the
inner scope. This causes the error status of AddBatchWork()
to be dropped when the inner scope ends. As a result, the error
status from AddBatchWork() (e.g. MemLimitExceeded) will not be
propagated back to the sender and the receiver will continue
to operate with some missing data, leading to wrong query
results.

This change fixes the problem by removing the redefinition of
the status local variable. It also adds some counters in the
profile to make diagnostics of failed RPCs or missing EOS easier.

Testing done: Stress test consistently reproduced the problem before.
With this fix, no wrong results have been seen after 2 iterations
of stress tests, which translates to about 20000 queries being run.

Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
---
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/krpc-data-stream-recvr.h
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
4 files changed, 32 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/9439/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9439
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6b2985a47021ebd4a970861040e7474aca7941b5
Gerrit-Change-Number: 9439
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Michael Ho <kw...@cloudera.com>
Gerrit-Reviewer: Sailesh Mukil <sa...@cloudera.com>