You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao Sun (JIRA)" <ji...@apache.org> on 2016/12/22 00:05:58 UTC
[jira] [Created] (HIVE-15492) Nested column pruning: be more aggressive on RS value expressions

Chao Sun created HIVE-15492:
-------------------------------

             Summary: Nested column pruning: be more aggressive on RS value expressions
                 Key: HIVE-15492
                 URL: https://issues.apache.org/jira/browse/HIVE-15492
             Project: Hive
          Issue Type: Sub-task
          Components: Query Planning
    Affects Versions: 2.2.0
            Reporter: Chao Sun
            Assignee: Chao Sun


Currently nested column pruning could still process unnecessary data when handling RS operators. For instance, given the following query (the source table can be found in {{nested_column_pruning.q}}):
{code}
SELECT t1.s1.f3.f4
FROM nested_tbl_1 t1 JOIN nested_tbl_2 t2
ON t1.s1.f3.f4 = t2.s1.f6
{code}

The generated plan is:
{code}
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: t1
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0.f3.f4 (type: int)
                sort order: +
                Map-reduce partition columns: _col0.f3.f4 (type: int)
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
          TableScan
            alias: t2
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              expressions: s1 (type: struct<f1:boolean,f2:string,f3:struct<f4:int,f5:double>,f6:int>)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator
                key expressions: _col0.f6 (type: int)
                sort order: +
                Map-reduce partition columns: _col0.f6 (type: int)
                Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 _col0.f3.f4 (type: int)
            1 _col0.f6 (type: int)
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
          Select Operator
            expressions: _col0.f3.f4 (type: int)
            outputColumnNames: _col0
            Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
            File Output Operator
              compressed: false
              Statistics: Num rows: 1 Data size: 4 Basic stats: COMPLETE Column stats: NONE
              table:
                  input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                  output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                  serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink
{code}

In particular, for table {{t1}} it needs to scan the whole {{s1}} struct since this is in the value expression of the associated RS. This can be further optimized as we only need {{s1.f3.f4}}.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)