You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/02/10 00:33:59 UTC
[jira] Commented: (HIVE-284) Column pruning after join+group-by
query
[ https://issues.apache.org/jira/browse/HIVE-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672052#action_12672052 ]
Zheng Shao commented on HIVE-284:
---------------------------------
{code}
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF t1) (TOK_TABREF t2) (= (TOK_COLREF t1 c) (TOK_COLREF t2 r)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF t1 r)) (TOK_SELEXPR (TOK_COLREF t2 c)) (TOK_SELEXPR (TOK_FUNCTION sum (* (TOK_COLREF t1 v) (TOK_COLREF t2 v))))) (TOK_GROUPBY (TOK_COLREF t1 r) (TOK_COLREF t2 c))))
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-2 depends on stages: Stage-1
Stage-3 depends on stages: Stage-2
Stage-0 depends on stages: Stage-3
STAGE PLANS:
Stage: Stage-1
Map Reduce
Alias -> Map Operator Tree:
t2
Reduce Output Operator
key expressions:
expr: r
type: string
sort order: +
Map-reduce partition columns:
expr: r
type: string
tag: 1
value expressions:
expr: r
type: string
expr: c
type: string
expr: v
type: string
t1
Reduce Output Operator
key expressions:
expr: c
type: string
sort order: +
Map-reduce partition columns:
expr: c
type: string
tag: 0
value expressions:
expr: r
type: string
expr: c
type: string
expr: v
type: string
Reduce Operator Tree:
Join Operator
condition map:
Inner Join 0 to 1
condition expressions:
0 {VALUE.0} {VALUE.1} {VALUE.2}
1 {VALUE.0} {VALUE.1} {VALUE.2}
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
name: binary_table
Stage: Stage-2
Map Reduce
Alias -> Map Operator Tree:
/tmp/hive-zshao/739192425/307037973.10001
Reduce Output Operator
key expressions:
expr: 0
type: string
expr: 4
type: string
sort order: ++
Map-reduce partition columns:
expr: rand()
type: double
tag: -1
value expressions:
expr: (UDFToDouble(2) * UDFToDouble(5))
type: double
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE.0)
keys:
expr: KEY.0
type: string
expr: KEY.1
type: string
mode: partial1
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
name: binary_table
Stage: Stage-3
Map Reduce
Alias -> Map Operator Tree:
/tmp/hive-zshao/739192425/307037973.10002
Reduce Output Operator
key expressions:
expr: 0
type: string
expr: 1
type: string
sort order: ++
Map-reduce partition columns:
expr: 0
type: string
expr: 1
type: string
tag: -1
value expressions:
expr: 2
type: double
Reduce Operator Tree:
Group By Operator
aggregations:
expr: sum(VALUE.0)
keys:
expr: KEY.0
type: string
expr: KEY.1
type: string
mode: final
Select Operator
expressions:
expr: 0
type: string
expr: 1
type: string
expr: 2
type: double
File Output Operator
compressed: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
name: t
Stage: Stage-0
Move Operator
tables:
replace: true
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
name: t
{code}
> Column pruning after join+group-by query
> ----------------------------------------
>
> Key: HIVE-284
> URL: https://issues.apache.org/jira/browse/HIVE-284
> Project: Hadoop Hive
> Issue Type: Improvement
> Affects Versions: 0.2.0, 0.3.0
> Reporter: Zheng Shao
>
> The query is:
> explain INSERT OVERWRITE TABLE t
> SELECT t1.r, t2.c, sum(t1.v * t2.v)
> FROM t1 join t2 on t1.c = t2.r
> GROUP BY t1.r, t2.c;
> The FileSinkOperator after the join is serializing all 6 columns from the 2 tables (both have 3 columns: r, c, v) instead of the 4 that is needed later.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.