You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2017/04/04 17:32:41 UTC
[jira] [Resolved] (IMPALA-1391) TPC-DS query 17 very slow
[ https://issues.apache.org/jira/browse/IMPALA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar resolved IMPALA-1391.
-------------------------------------
Resolution: Fixed
Runtime filters speed the query by >10x
> TPC-DS query 17 very slow
> -------------------------
>
> Key: IMPALA-1391
> URL: https://issues.apache.org/jira/browse/IMPALA-1391
> Project: IMPALA
> Issue Type: Bug
> Components: Perf Investigation
> Affects Versions: Impala 2.0
> Reporter: David Rorke
> Assignee: Mostafa Mokhtar
> Priority: Minor
> Labels: performance
> Fix For: Impala 2.5.0
>
> Attachments: q17.profile, q17_shuffle_hint.plan, tpc_ds_q17.pdf
>
>
> TPC-DS query 17 takes 56 minutes on a 15 TB scale factor data set (20 node cluster). This is with explicit partition filters added on each of the large fact tables. A few points I noticed in the plan/profile:
> (1) The bulk of the time is used in the joins and aggregation. There is some spilling (mostly in one of the joins).
> (2) The plan is using broadcast joins in all cases, even when joining large tables/result sets.
> (3) I rewrote the query to use SQL 92 style joins and added "shuffle" hints on what should be the larger joins. The resulting plan uses a partitioned join for one of the 2 cases where I added a shuffle hint, but continues to use a broadcast for the other large join.
> The profile for the original query and the explain plan output for the modified (hinted) query are attached.
> Published results from HWX claim that this query runs in 300 seconds with Hive/Tez and a 30 TB scale factor (we haven't independently verified this Hive time).
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)