You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2020/08/06 15:11:00 UTC

[jira] [Commented] (IMPALA-10047) Performance regression on short queries due to IMPALA-6984 fix

    [ https://issues.apache.org/jira/browse/IMPALA-10047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17172453#comment-17172453 ] 

ASF subversion and git services commented on IMPALA-10047:
----------------------------------------------------------

Commit c413f9b558d51de877f497590baf14139ad5cf99 in impala's branch refs/heads/master from Joe McDonnell
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=c413f9b ]

IMPALA-10047: Revert core piece of IMPALA-6984

Performance testing on TPC-DS found a peformance regression
on short queries due to delayed exec status reports. Further
testing traced this back to IMPALA-6984's behavior of
cancelling backends on EOS. The coordinator log show that
CancelBackends() call intermittently taking 10 seconds due
to timing out in the RPC layer.

As a temporary workaround, this reverts the core part of
IMPALA-6984 that added that CancelBackends() call for EOS.
It leaves the rest of IMPALA-6984 intact, as other code has built
on top of it.

Testing:
 - Core job
 - Performance tests

Change-Id: Ibf00a56e91f0376eaaa552e3bb4763501bfb49e8
(cherry picked from commit b91f3c0e064d592f3cdf2a2e089ca6546133ba55)
Reviewed-on: http://gerrit.cloudera.org:8080/16288
Reviewed-by: Joe McDonnell <jo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Performance regression on short queries due to IMPALA-6984 fix
> --------------------------------------------------------------
>
>                 Key: IMPALA-10047
>                 URL: https://issues.apache.org/jira/browse/IMPALA-10047
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 4.0
>            Reporter: Joe McDonnell
>            Assignee: Joe McDonnell
>            Priority: Blocker
>
> When doing some TPC-DS benchmarking with mt_dop, we encountered intermittent performance regressions on short queries. Some query executions seem to be taking an extra 10 seconds in exec status reports due to delays in sending a cancel RPC. From the coordinator logs:
>  
> {noformat}
> W0804 02:52:33.922088   108 rpcz_store.cc:253] Call impala.ControlService.CancelQueryFInstances from 127.0.0.1:46738 (request call id 3134) took 10007 ms (10 s). Client timeout 10000 ms (10 s)
> W0804 02:52:33.922143   108 rpcz_store.cc:259] Trace:
> 0804 02:52:23.914291 (+     0us) impala-service-pool.cc:170] Inserting onto call queue
> 0804 02:52:33.922079 (+10007788us) impala-service-pool.cc:255] Skipping call since client already timed out
> 0804 02:52:33.922086 (+     7us) inbound_call.cc:162] Queueing failure response
> Metrics: {}
> I0804 02:52:33.922214   101 connection.cc:730] Got response to call id 3134 after client already timed out or cancelled
> I0804 02:52:33.923286 20276 coordinator-backend-state.cc:889] query_id=f442e73a0d35c136:c9993d7700000000 target backend=xx.xx.xx.xx:27000: Sending CancelQueryFInstances rpc{noformat}
> The rpcz page also shows that some ReportExecStatus RPCs are taking 10 seconds:
>  
>  
> {noformat}
> "incoming_queue_time": "Count: 671901, min / max: 1000.000ns / 10s347ms, 25th %-ile: 12.000us, 50th %-ile: 18.000us, 75th %-ile: 28.000us, 90th %-ile: 67.000us, 95th %-ile: 456.000us, 99.9th %-ile: 10s133ms",
> {
> 		"method_name": "ReportExecStatus",
> 		"handler_latency": "Count: 169653, min / max: 38.000us / 10s173ms, 25th %-ile: 9.024ms, 50th %-ile: 20.352ms, 75th %-ile: 35.840ms, 90th %-ile: 94.720ms, 95th %-ile: 177.152ms, 99.9th %-ile: 10s027ms",
> 		"payload_size": "Count: 169653, min / max: 5.81 KB / 3.81 MB, 25th %-ile: 425.00 KB, 50th %-ile: 760.00 KB, 75th %-ile: 1.47 MB, 90th %-ile: 1.96 MB, 95th %-ile: 2.31 MB, 99.9th %-ile: 3.73 MB"
> 	}]{noformat}
>  
> IMPALA-6984 introduced a Coordinator::CancelBackends() call to Coordinator::HandleExecStateTransition() for the ExecState::RETURNED_RESULTS case:
> {noformat}
>   if (new_state == ExecState::RETURNED_RESULTS) {
>     // Cancel all backends, but wait for the final status reports to be received so that
>     // we have a complete profile for this successful query.
>     CancelBackends(/*fire_and_forget=*/ false);
>     WaitForBackends();
>   } else {
>     CancelBackends(/*fire_and_forget=*/ true);
>   }{noformat}
> Removing this call eliminates the performance regression, so it will need more investigation.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org