You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jacques Nadeau (JIRA)" <ji...@apache.org> on 2015/05/05 15:37:06 UTC

[jira] [Updated] (DRILL-2735) Broadcast plan get's "lost" when the same query is used in UNION ALL

     [ https://issues.apache.org/jira/browse/DRILL-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacques Nadeau updated DRILL-2735:
----------------------------------
    Fix Version/s: 1.2.0

> Broadcast plan get's "lost" when the same query is used in UNION ALL
> --------------------------------------------------------------------
>
>                 Key: DRILL-2735
>                 URL: https://issues.apache.org/jira/browse/DRILL-2735
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>             Fix For: 1.2.0
>
>
> I get a broadcast plan for simple inner join query.
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      UnionExchange
> 01-01        Project(c_integer=[$0])
> 01-02          HashJoin(condition=[=($0, $1)], joinType=[inner])
> 01-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
> 01-03            Project(c_integer0=[$0])
> 01-05              BroadcastExchange
> 02-01                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ {
>       "name" : "planner.broadcast_factor",
>       "kind" : "DOUBLE",
>       "type" : "SESSION",
>       "float_val" : 0.0
>     }, {
>       "name" : "planner.slice_target",
>       "kind" : "LONG",
>       "type" : "SESSION",
>       "num_val" : 1
>     } ],
> {code}
> Create table succeeds and multiple fragments are executed:
> {code}
> 0: jdbc:drill:schema=dfs> create table test(a1) as  select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
> +------------+---------------------------+
> |  Fragment  | Number of records written |
> +------------+---------------------------+
> | 1_1        | 0                         |
> | 1_3        | 0                         |
> | 1_31       | 0                         |
> | 1_43       | 0                         |
> | 1_35       | 0                         |
> | 1_21       | 0                         |
> | 1_19       | 0                         |
> | 1_27       | 1                         |
> | 1_17       | 1                         |
> | 1_13       | 0                         |
> | 1_29       | 0                         |
> | 1_33       | 0                         |
> | 1_25       | 0                         |
> | 1_7        | 0                         |
> | 1_11       | 0                         |
> | 1_37       | 0                         |
> | 1_45       | 0                         |
> | 1_9        | 0                         |
> | 1_23       | 1                         |
> | 1_15       | 0                         |
> | 1_41       | 0                         |
> | 1_39       | 0                         |
> | 1_5        | 0                         |
> | 1_10       | 0                         |
> | 1_14       | 0                         |
> | 1_24       | 0                         |
> | 1_16       | 0                         |
> | 1_12       | 0                         |
> | 1_36       | 0                         |
> | 1_20       | 0                         |
> | 1_34       | 1                         |
> | 1_40       | 0                         |
> | 1_22       | 0                         |
> | 1_26       | 0                         |
> | 1_32       | 1                         |
> | 1_8        | 0                         |
> | 1_18       | 0                         |
> | 1_42       | 0                         |
> | 1_44       | 0                         |
> | 1_38       | 0                         |
> | 1_30       | 0                         |
> | 1_28       | 1                         |
> | 1_4        | 10                        |
> | 1_2        | 1                         |
> | 1_6        | 0                         |
> | 1_0        | 0                         |
> +------------+---------------------------+
> 46 rows selected (2.337 seconds)
> {code}
> 8 parquet files are written:
> {code}
> [Wed Apr 08 11:41:10 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/test ] # ls -ltr
> total 4
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_17_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_27_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_23_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_34_0.parquet
> -rwxr-xr-x 1 mapr mapr 185 Apr  8 11:40 1_4_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_32_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_28_0.parquet
> -rwxr-xr-x 1 mapr mapr 146 Apr  8 11:40 1_2_0.parquet
> {code}
> However:
> 1. broadcast join is not planned when the same join query used in legs of the union all oprartor:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer union all select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      UnionAll(all=[true])
> 00-03        Project(c_integer=[$0])
> 00-05          HashJoin(condition=[=($0, $1)], joinType=[inner])
> 00-09            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
> 00-08            Project(c_integer0=[$0])
> 00-11              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
> 00-02        Project(c_integer=[$0])
> 00-04          HashJoin(condition=[=($0, $1)], joinType=[inner])
> 00-07            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
> 00-06            Project(c_integer0=[$0])
> 00-10              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
>  | {
>   "head" : {
>     "version" : 1,
>     "generator" : {
>       "type" : "ExplainHandler",
>       "info" : ""
>     },
>     "type" : "APACHE_DRILL_PHYSICAL",
>     "options" : [ {
>       "name" : "planner.broadcast_factor",
>       "kind" : "DOUBLE",
>       "type" : "SESSION",
>       "float_val" : 0.0
>     }, {
>       "name" : "planner.slice_target",
>       "kind" : "LONG",
>       "type" : "SESSION",
>       "num_val" : 1
>     } ],
> {code}
> 2. CTAS with this query produces 1 parquet file:
> {code}
> [Thu Apr 09 09:13:26 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/yyy ] # ls
> 0_0_0.parquet
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)