You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/02/10 00:33:59 UTC
[jira] Commented: (HIVE-284) Column pruning after join+group-by query

    [ https://issues.apache.org/jira/browse/HIVE-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12672052#action_12672052 ] 

Zheng Shao commented on HIVE-284:
---------------------------------

{code}

ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_JOIN (TOK_TABREF t1) (TOK_TABREF t2) (= (TOK_COLREF t1 c) (TOK_COLREF t2 r)))) (TOK_INSERT (TOK_DESTINATION (TOK_TAB t)) (TOK_SELECT (TOK_SELEXPR (TOK_COLREF t1 r)) (TOK_SELEXPR (TOK_COLREF t2 c)) (TOK_SELEXPR (TOK_FUNCTION sum (* (TOK_COLREF t1 v) (TOK_COLREF t2 v))))) (TOK_GROUPBY (TOK_COLREF t1 r) (TOK_COLREF t2 c))))

STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-2 depends on stages: Stage-1
  Stage-3 depends on stages: Stage-2
  Stage-0 depends on stages: Stage-3

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        t2
            Reduce Output Operator
              key expressions:
                    expr: r
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: r
                    type: string
              tag: 1
              value expressions:
                    expr: r
                    type: string
                    expr: c
                    type: string
                    expr: v
                    type: string
        t1
            Reduce Output Operator
              key expressions:
                    expr: c
                    type: string
              sort order: +
              Map-reduce partition columns:
                    expr: c
                    type: string
              tag: 0
              value expressions:
                    expr: r
                    type: string
                    expr: c
                    type: string
                    expr: v
                    type: string
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          condition expressions:
            0 {VALUE.0} {VALUE.1} {VALUE.2}
            1 {VALUE.0} {VALUE.1} {VALUE.2}
          File Output Operator
            compressed: true
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                name: binary_table

  Stage: Stage-2
    Map Reduce
      Alias -> Map Operator Tree:
        /tmp/hive-zshao/739192425/307037973.10001
          Reduce Output Operator
            key expressions:
                  expr: 0
                  type: string
                  expr: 4
                  type: string
            sort order: ++
            Map-reduce partition columns:
                  expr: rand()
                  type: double
            tag: -1
            value expressions:
                  expr: (UDFToDouble(2) * UDFToDouble(5))
                  type: double
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE.0)
          keys:
                expr: KEY.0
                type: string
                expr: KEY.1
                type: string
          mode: partial1
          File Output Operator
            compressed: true
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                name: binary_table

  Stage: Stage-3
    Map Reduce
      Alias -> Map Operator Tree:
        /tmp/hive-zshao/739192425/307037973.10002
          Reduce Output Operator
            key expressions:
                  expr: 0
                  type: string
                  expr: 1
                  type: string
            sort order: ++
            Map-reduce partition columns:
                  expr: 0
                  type: string
                  expr: 1
                  type: string
            tag: -1
            value expressions:
                  expr: 2
                  type: double
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: sum(VALUE.0)
          keys:
                expr: KEY.0
                type: string
                expr: KEY.1
                type: string
          mode: final
          Select Operator
            expressions:
                  expr: 0
                  type: string
                  expr: 1
                  type: string
                  expr: 2
                  type: double
            File Output Operator
              compressed: true
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
                  name: t

  Stage: Stage-0
    Move Operator
      tables:
            replace: true
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.mapred.SequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.MetadataTypedColumnsetSerDe
                name: t

{code}

> Column pruning after join+group-by query
> ----------------------------------------
>
>                 Key: HIVE-284
>                 URL: https://issues.apache.org/jira/browse/HIVE-284
>             Project: Hadoop Hive
>          Issue Type: Improvement
>    Affects Versions: 0.2.0, 0.3.0
>            Reporter: Zheng Shao
>
> The query is:
> explain INSERT OVERWRITE TABLE t
>     SELECT t1.r, t2.c, sum(t1.v * t2.v)
>     FROM t1 join t2 on t1.c = t2.r
> GROUP BY t1.r, t2.c;
> The FileSinkOperator after the join is serializing all 6 columns from the 2 tables (both have 3 columns: r, c, v) instead of the 4 that is needed later.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.