You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/08/05 08:24:00 UTC
[jira] [Created] (IMPALA-10844) Retry queries that failed by bad plans

Quanlong Huang created IMPALA-10844:
---------------------------------------

             Summary: Retry queries that failed by bad plans
                 Key: IMPALA-10844
                 URL: https://issues.apache.org/jira/browse/IMPALA-10844
             Project: IMPALA
          Issue Type: New Feature
            Reporter: Quanlong Huang


IMPALA-9124 adds support for transparant query retry. It'd be nice if we can also retry queries failed by bad plans and re-create better plans base on the exec summary.

For instance, a query joining two large tables with lots of predicates may have underestimated cardinaties, which may lead to broadcast join instead of partitioned join, and finally fail the query by OOM. This usually happens when there are skews in the data distribution.

Here is the exec summary of a failed query:
{code:java}
Operator          #Hosts   Avg Time   Max Time    #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail                                                     
----------------------------------------------------------------------------------------------------------------------------------------------------------
10:EXCHANGE            1    0.000ns    0.000ns        0           1          0              0  UNPARTITIONED                                              
09:AGGREGATE          73    3.734ms  225.541ms        0           1   59.12 KB       10.00 MB  FINALIZE                                                   
08:EXCHANGE           73    0.000ns    0.000ns        0           1          0              0  HASH(a11.aggregation_date,a11.marca,a11.operator)          
04:AGGREGATE          73  340.895us    2.278ms        0           1   59.12 KB       10.00 MB  STREAMING                                                  
07:AGGREGATE          73    3.692ms  226.687ms        0           1   76.12 KB       10.00 MB                                                             
06:EXCHANGE           73    0.000ns    0.000ns        0           1          0              0  HASH(a11.aggregation_date,a11.marca,a11.operator,a11.imsi) 
03:AGGREGATE          73    2.109ms  131.902ms        0           1   76.12 KB       10.00 MB  STREAMING 
02:HASH JOIN          73   13s238ms      16m6s        0           1  199.65 GB        2.88 MB  INNER JOIN, BROADCAST                                      
|--05:EXCHANGE        73     59m56s       1h4m  960.39M       7.51K          0              0  BROADCAST                                                  
|  00:SCAN HDFS       99         1m      7m42s  960.28M       7.51K  604.48 MB        1.29 GB  large_table_a                                         
01:SCAN HDFS          73  673.533us    4.572ms        0       4.94M    4.00 KB       96.00 MB  large_table_b{code}
We can correct the cardinality of scan nodes and then re-create a partitioned join (or broadcast the other side instead which looks small actually).

The output cardinality of JOINs is also hard to estimate. We can use the original query's exec summary to correct the estimation too.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org