You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2017/04/04 17:32:41 UTC

[jira] [Resolved] (IMPALA-1391) TPC-DS query 17 very slow

     [ https://issues.apache.org/jira/browse/IMPALA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Mokhtar resolved IMPALA-1391.
-------------------------------------
    Resolution: Fixed

Runtime filters speed the query by >10x 

> TPC-DS query 17 very slow
> -------------------------
>
>                 Key: IMPALA-1391
>                 URL: https://issues.apache.org/jira/browse/IMPALA-1391
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Perf Investigation
>    Affects Versions: Impala 2.0
>            Reporter: David Rorke
>            Assignee: Mostafa Mokhtar
>            Priority: Minor
>              Labels: performance
>             Fix For: Impala 2.5.0
>
>         Attachments: q17.profile, q17_shuffle_hint.plan, tpc_ds_q17.pdf
>
>
> TPC-DS query 17 takes 56 minutes on a 15 TB scale factor data set (20 node cluster). This is with explicit partition filters added on each of the large fact tables.  A few points I noticed in the plan/profile:
> (1) The bulk of the time is used in the joins and aggregation.  There is some spilling (mostly in one of the joins).
> (2) The plan is using broadcast joins in all cases, even when joining large tables/result sets.
> (3) I rewrote the query to use SQL 92 style joins and added "shuffle" hints on what should be the larger joins.  The resulting plan uses a partitioned join for one of the 2 cases where I added a shuffle hint, but continues to use a broadcast for the other large join.
> The profile for the original query and the explain plan output for the modified (hinted) query are attached.
> Published results from HWX claim that this query runs in 300 seconds with Hive/Tez and a 30 TB scale factor (we haven't independently verified this Hive time).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)