You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/12/22 00:05:58 UTC
[jira] [Created] (HIVE-15492) Nested column pruning: be more
aggressive on RS value expressions
Chao Sun created HIVE-15492:
-------------------------------
Summary: Nested column pruning: be more aggressive on RS value expressions
Key: HIVE-15492
URL: https://issues.apache.org/jira/browse/HIVE-15492
Project: Hive
Issue Type: Sub-task
Components: Query Planning
Affects Versions: 2.2.0
Reporter: Chao Sun
Assignee: Chao Sun
Currently nested column pruning could still process unnecessary data when handling RS operators. For instance, given the following query (the source table can be found in {{nested_column_pruning.q}}):
{code}
SELECT t1.s1.f3.f4
FROM nested_tbl_1 t1 JOIN nested_tbl_2 t2
ON t1.s1.f3.f4 = t2.s1.f6
{code}
The generated plan is:
{code}
STAGE PLANS:
Stage: Stage-1
Map Reduce
Map Operator Tree:
TableScan
alias: t1
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0.f3.f4 (type: int)
sort order: +
Map-reduce partition columns: _col0.f3.f4 (type: int)
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
TableScan
alias: t2
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0.f6 (type: int)
sort order: +
Map-reduce partition columns: _col0.f6 (type: int)
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0.f3.f4 (type: int)
1 _col0.f6 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0.f3.f4 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
In particular, for table {{t1}} it needs to scan the whole {{s1}} struct since this is in the value expression of the associated RS. This can be further optimized as we only need {{s1.f3.f4}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)