You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Tim Armstrong (Code Review)" <ge...@cloudera.org> on 2018/06/06 20:50:07 UTC

[Impala-ASF-CR](2.x) IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

Hello Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/10619

to review the following change.


Change subject: IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang
......................................................................

IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

If we hit EOS, we'll wait for all the backends to report status (to try
to get a complete profile). But if the query is closed after this point,
then we can get stuck waiting since once the query is closed,
ImpalaServer won't know about this coordinator and so it will stop
forwarding on the ReportStatus RPCs.

The real fix for this is IMPALA-6984, but in the mean time, add another
special case for this JIRA (see the other TODO IMPALA-6984 in
coordinator.cc).

The cancellation test only finds this race once in a while (several
hours) indirectly in a COMPUTE STATS query because the
ChildQueryExecutor will do a CloseOperation() while the execution thread
is inside Fetch(). To make this more reproducible, modify the
cancellation test to allow the close and fetch rpcs to execute
concurrently (don't join the test's fetch thread until after
close). This makes the race reproducible in a few iterations and a few
minutes.

Testing:
- Loop test_cancellation.py

Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Reviewed-on: http://gerrit.cloudera.org:8080/10601
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/runtime/coordinator.cc
M tests/query_test/test_cancellation.py
2 files changed, 47 insertions(+), 9 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/19/10619/1
-- 
To view, visit http://gerrit.cloudera.org:8080/10619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Gerrit-Change-Number: 10619
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR](2.x) IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10619 )

Change subject: IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang
......................................................................


Patch Set 1: Verified+1

Verified with https://gerrit.cloudera.org/#/c/10623/


-- 
To view, visit http://gerrit.cloudera.org:8080/10619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Gerrit-Change-Number: 10619
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Thu, 07 Jun 2018 00:23:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10619 )

Change subject: IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang
......................................................................


Patch Set 1: Code-Review+2

Clean pick


-- 
To view, visit http://gerrit.cloudera.org:8080/10619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Gerrit-Change-Number: 10619
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Comment-Date: Wed, 06 Jun 2018 21:05:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10619 )

Change subject: IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang
......................................................................

IMPALA-7101: Fix race between Fetch and Close RPCs that can lead to hang

If we hit EOS, we'll wait for all the backends to report status (to try
to get a complete profile). But if the query is closed after this point,
then we can get stuck waiting since once the query is closed,
ImpalaServer won't know about this coordinator and so it will stop
forwarding on the ReportStatus RPCs.

The real fix for this is IMPALA-6984, but in the mean time, add another
special case for this JIRA (see the other TODO IMPALA-6984 in
coordinator.cc).

The cancellation test only finds this race once in a while (several
hours) indirectly in a COMPUTE STATS query because the
ChildQueryExecutor will do a CloseOperation() while the execution thread
is inside Fetch(). To make this more reproducible, modify the
cancellation test to allow the close and fetch rpcs to execute
concurrently (don't join the test's fetch thread until after
close). This makes the race reproducible in a few iterations and a few
minutes.

Testing:
- Loop test_cancellation.py

Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Reviewed-on: http://gerrit.cloudera.org:8080/10601
Reviewed-by: Tim Armstrong <ta...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/10619
Tested-by: Tim Armstrong <ta...@cloudera.com>
---
M be/src/runtime/coordinator.cc
M tests/query_test/test_cancellation.py
2 files changed, 47 insertions(+), 9 deletions(-)

Approvals:
  Tim Armstrong: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/10619
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: merged
Gerrit-Change-Id: I7c147550f86d81b818ecbdd34cf2919ced7ff8c5
Gerrit-Change-Number: 10619
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Dan Hecht <dh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>