You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2020/06/11 22:27:00 UTC
[jira] [Created] (IMPALA-9850) Avoid doing expensive join at the
coordinator
Aman Sinha created IMPALA-9850:
----------------------------------
Summary: Avoid doing expensive join at the coordinator
Key: IMPALA-9850
URL: https://issues.apache.org/jira/browse/IMPALA-9850
Project: IMPALA
Issue Type: Improvement
Components: Frontend
Affects Versions: Impala 3.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
When we join 2 subqueries each of which has its root fragment executing on the coordinator, currently the join is also executed on the coordinator. Here's an example query:
{noformat}
select count(*) from
( select rank() over (order by l_quantity) x from lineitem) dt1
inner join
(select rank() over (order by l_shipdate) y from lineitem) dt2
on dt1.x = dt2.y
{noformat}
This can result in poor performance since the join inputs can be quite large. Ideally, we want to re-distribute both intermediate results after the rank() has been computed and do the join on executor nodes.
Another similar scenario is joining of subqueries which have ORDER BY LIMIT.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)