You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/04/09 18:22:12 UTC
[jira] [Created] (DRILL-2735) Broadcast plan get's "lost" when the
same query is used in UNION ALL
Victoria Markman created DRILL-2735:
---------------------------------------
Summary: Broadcast plan get's "lost" when the same query is used in UNION ALL
Key: DRILL-2735
URL: https://issues.apache.org/jira/browse/DRILL-2735
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 0.9.0
Reporter: Victoria Markman
Assignee: Jinfeng Ni
I get a broadcast plan for simple inner join query.
{code}
0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 UnionExchange
01-01 Project(c_integer=[$0])
01-02 HashJoin(condition=[=($0, $1)], joinType=[inner])
01-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
01-03 Project(c_integer0=[$0])
01-05 BroadcastExchange
02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
"name" : "planner.broadcast_factor",
"kind" : "DOUBLE",
"type" : "SESSION",
"float_val" : 0.0
}, {
"name" : "planner.slice_target",
"kind" : "LONG",
"type" : "SESSION",
"num_val" : 1
} ],
{code}
Create table succeeds and multiple fragments are executed:
{code}
0: jdbc:drill:schema=dfs> create table test(a1) as select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
+------------+---------------------------+
| Fragment | Number of records written |
+------------+---------------------------+
| 1_1 | 0 |
| 1_3 | 0 |
| 1_31 | 0 |
| 1_43 | 0 |
| 1_35 | 0 |
| 1_21 | 0 |
| 1_19 | 0 |
| 1_27 | 1 |
| 1_17 | 1 |
| 1_13 | 0 |
| 1_29 | 0 |
| 1_33 | 0 |
| 1_25 | 0 |
| 1_7 | 0 |
| 1_11 | 0 |
| 1_37 | 0 |
| 1_45 | 0 |
| 1_9 | 0 |
| 1_23 | 1 |
| 1_15 | 0 |
| 1_41 | 0 |
| 1_39 | 0 |
| 1_5 | 0 |
| 1_10 | 0 |
| 1_14 | 0 |
| 1_24 | 0 |
| 1_16 | 0 |
| 1_12 | 0 |
| 1_36 | 0 |
| 1_20 | 0 |
| 1_34 | 1 |
| 1_40 | 0 |
| 1_22 | 0 |
| 1_26 | 0 |
| 1_32 | 1 |
| 1_8 | 0 |
| 1_18 | 0 |
| 1_42 | 0 |
| 1_44 | 0 |
| 1_38 | 0 |
| 1_30 | 0 |
| 1_28 | 1 |
| 1_4 | 10 |
| 1_2 | 1 |
| 1_6 | 0 |
| 1_0 | 0 |
+------------+---------------------------+
46 rows selected (2.337 seconds)
{code}
8 parquet files are written:
{code}
[Wed Apr 08 11:41:10 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/test ] # ls -ltr
total 4
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_17_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_27_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_23_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_34_0.parquet
-rwxr-xr-x 1 mapr mapr 185 Apr 8 11:40 1_4_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_32_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_28_0.parquet
-rwxr-xr-x 1 mapr mapr 146 Apr 8 11:40 1_2_0.parquet
{code}
However:
1. broadcast join is not planned when the same join query used in legs of the union all oprartor:
{code}
0: jdbc:drill:schema=dfs> explain plan for select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer union all select j1.c_integer from j1, j2 where j1.c_integer = j2.c_integer;
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 UnionAll(all=[true])
00-03 Project(c_integer=[$0])
00-05 HashJoin(condition=[=($0, $1)], joinType=[inner])
00-09 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
00-08 Project(c_integer0=[$0])
00-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
00-02 Project(c_integer=[$0])
00-04 HashJoin(condition=[=($0, $1)], joinType=[inner])
00-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j1]], selectionRoot=/drill/testdata/ctas/j1, numFiles=1, columns=[`c_integer`]]])
00-06 Project(c_integer0=[$0])
00-10 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/ctas/j2]], selectionRoot=/drill/testdata/ctas/j2, numFiles=1, columns=[`c_integer`]]])
| {
"head" : {
"version" : 1,
"generator" : {
"type" : "ExplainHandler",
"info" : ""
},
"type" : "APACHE_DRILL_PHYSICAL",
"options" : [ {
"name" : "planner.broadcast_factor",
"kind" : "DOUBLE",
"type" : "SESSION",
"float_val" : 0.0
}, {
"name" : "planner.slice_target",
"kind" : "LONG",
"type" : "SESSION",
"num_val" : 1
} ],
{code}
2. CTAS with this query produces 1 parquet file:
{code}
[Thu Apr 09 09:13:26 root@/mapr/vmarkman.cluster.com/drill/testdata/ctas/yyy ] # ls
0_0_0.parquet
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)