You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "LongShangRen (JIRA)" <ji...@apache.org> on 2016/11/03 14:54:58 UTC
[jira] [Created] (HIVE-15117) Partition filters are not pushed down
with lateral view and undeterministic UDF
LongShangRen created HIVE-15117:
-----------------------------------
Summary: Partition filters are not pushed down with lateral view and undeterministic UDF
Key: HIVE-15117
URL: https://issues.apache.org/jira/browse/HIVE-15117
Project: Hive
Issue Type: Bug
Components: Hive
Affects Versions: 1.2.1
Reporter: LongShangRen
Fix For: 1.2.1
sql with lateral view didn't push down partition column as expected!.
here is how it can be reproduced.
1. *create test table*
{quote}
create table test_lateral_view (id bigint,json_cont string) partitioned by (vt string);
{quote}
2. *explain below sql*
{quote}
select *
from test_lateral_view a
lateral view json_tuple(json_cont, 'iids', 'indexs') b as iids,indexs
where a.vt = '2016-10-27'
and rand()>0.5;
{quote}
here is my result:
{quote}
STAGE DEPENDENCIES:
Stage-0 is a root stage
STAGE PLANS:
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
TableScan
alias: a
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Forward
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: id (type: bigint), json_cont (type: string), vt (type: string)
outputColumnNames: id, json_cont, vt
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
{color:red}
predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)
{color:red}
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
Select Operator
expressions: json_cont (type: string), 'iids' (type: string), 'indexs' (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
UDTF Operator
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
function name: json_tuple
Lateral View Join Operator
outputColumnNames: _col0, _col1, _col2, _col6, _col7
Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Filter Operator
predicate: ((_col2 = '2016-10-27') and (rand() > 0.5)) (type: boolean)
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Select Operator
expressions: _col0 (type: bigint), _col1 (type: string), '2016-10-27' (type: string), _col6 (type: string), _col7 (type: string)
outputColumnNames: _col0, _col1, _col2, _col3, _col4
Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
ListSink
{quote}
As you can see,the partition column is in filter operator,which means this sql will scan the whole table.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)