You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/06/11 22:27:00 UTC

[jira] [Created] (IMPALA-9850) Avoid doing expensive join at the coordinator

Aman Sinha created IMPALA-9850:
----------------------------------

             Summary: Avoid doing expensive join at the coordinator
                 Key: IMPALA-9850
                 URL: https://issues.apache.org/jira/browse/IMPALA-9850
             Project: IMPALA
          Issue Type: Improvement
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


When we join 2 subqueries each of which has its root fragment executing on the coordinator, currently the join is also executed on the coordinator. Here's an example query:
{noformat}
select count(*) from 
  ( select rank() over (order by l_quantity) x from lineitem) dt1 
     inner join 
  (select rank() over (order by l_shipdate) y from lineitem) dt2
     on dt1.x = dt2.y
{noformat}
This can result in poor performance since the join inputs can be quite large.  Ideally, we want to re-distribute both intermediate results after the rank() has been computed and do the join on executor nodes.  

Another similar scenario is joining of subqueries which have ORDER BY LIMIT. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)