You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/01/05 00:57:52 UTC

[jira] [Updated] (DRILL-1762) Apply different levels of parallelization to different phases of the query

     [ https://issues.apache.org/jira/browse/DRILL-1762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacques Nadeau updated DRILL-1762:
----------------------------------
    Fix Version/s:     (was: 0.8.0)
                   Future

> Apply different levels of parallelization to different phases of the query 
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1762
>                 URL: https://issues.apache.org/jira/browse/DRILL-1762
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Query Planning & Optimization
>    Affects Versions: 0.6.0
>         Environment: 10-node Drill cluster with TPCH schema (SF100)
> OS: RHEL 6.4
> cores/node: 32
> RAM: 256GB
>            Reporter: Kunal Khatua
>             Fix For: Future
>
>
> When running TPCH queries, we found that setting the planner.max.width_per_node to the maximum number of cores available didn't necessarily improve their runtimes. In almost all queries, setting the property to as low as 12 (on eacch 32-core node), resulted in faster runtimes... which is counter-intuitive.
> It is highly possible that the leaf fragments have a larger overhead than many other fragments due to I/O operations, and the high level of parallelization might be causing the entire phase of data scan to be sub-optimal. Based on further investigation, qe might want to split this property  in two... one for leaf fragments (SCANs) and one for the rest of the class of fragments.
> e.g TPCH 08 (SF100)
> MaxWidth	Runtime (msec)
> 12		34,650
> 16		35,216
> 20		39,482
> 24		41,694
> 28		46,579
> 32		57,207



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)