You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/05 08:24:00 UTC
[jira] [Created] (IMPALA-10844) Retry queries that failed by bad
plans
Quanlong Huang created IMPALA-10844:
---------------------------------------
Summary: Retry queries that failed by bad plans
Key: IMPALA-10844
URL: https://issues.apache.org/jira/browse/IMPALA-10844
Project: IMPALA
Issue Type: New Feature
Reporter: Quanlong Huang
IMPALA-9124 adds support for transparant query retry. It'd be nice if we can also retry queries failed by bad plans and re-create better plans base on the exec summary.
For instance, a query joining two large tables with lots of predicates may have underestimated cardinaties, which may lead to broadcast join instead of partitioned join, and finally fail the query by OOM. This usually happens when there are skews in the data distribution.
Here is the exec summary of a failed query:
{code:java}
Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------------------------------------------
10:EXCHANGE 1 0.000ns 0.000ns 0 1 0 0 UNPARTITIONED
09:AGGREGATE 73 3.734ms 225.541ms 0 1 59.12 KB 10.00 MB FINALIZE
08:EXCHANGE 73 0.000ns 0.000ns 0 1 0 0 HASH(a11.aggregation_date,a11.marca,a11.operator)
04:AGGREGATE 73 340.895us 2.278ms 0 1 59.12 KB 10.00 MB STREAMING
07:AGGREGATE 73 3.692ms 226.687ms 0 1 76.12 KB 10.00 MB
06:EXCHANGE 73 0.000ns 0.000ns 0 1 0 0 HASH(a11.aggregation_date,a11.marca,a11.operator,a11.imsi)
03:AGGREGATE 73 2.109ms 131.902ms 0 1 76.12 KB 10.00 MB STREAMING
02:HASH JOIN 73 13s238ms 16m6s 0 1 199.65 GB 2.88 MB INNER JOIN, BROADCAST
|--05:EXCHANGE 73 59m56s 1h4m 960.39M 7.51K 0 0 BROADCAST
| 00:SCAN HDFS 99 1m 7m42s 960.28M 7.51K 604.48 MB 1.29 GB large_table_a
01:SCAN HDFS 73 673.533us 4.572ms 0 4.94M 4.00 KB 96.00 MB large_table_b{code}
We can correct the cardinality of scan nodes and then re-create a partitioned join (or broadcast the other side instead which looks small actually).
The output cardinality of JOINs is also hard to estimate. We can use the original query's exec summary to correct the estimation too.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org