You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Chao Sun <ch...@cloudera.com> on 2014/11/21 01:57:14 UTC

Review Request 28307: HIVE-8908 - Investigate test failure on join34.q [Spark Branch]

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/28307/
-----------------------------------------------------------

Review request for hive, Jimmy Xiang and Szehon Ho.


Bugs: HIVE-8908
    https://issues.apache.org/jira/browse/HIVE-8908


Repository: hive-git


Description
-------

For this query, the plan doesn't look correct:
OK
STAGE DEPENDENCIES:
  Stage-4 is a root stage
  Stage-1 depends on stages: Stage-5, Stage-4
  Stage-2 depends on stages: Stage-1
  Stage-0 depends on stages: Stage-2
  Stage-3 depends on stages: Stage-0
  Stage-5 is a root stage

STAGE PLANS:
  Stage: Stage-4
    Spark
      DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: x
                  Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                    Spark HashTable Sink Operator
                      condition expressions:
                        0 {_col1}
                        1 {value}
                      keys:
                        0 _col0 (type: string)
                        1 key (type: string)
                    Reduce Output Operator
                      key expressions: key (type: string)
                      sort order: +
                      Map-reduce partition columns: key (type: string)
                      Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                      value expressions: value (type: string)
            Local Work:
              Map Reduce Local Work

  Stage: Stage-1
    Spark
      Edges:
        Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0)
      DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4
      Vertices:
        Map 1 
            Map Operator Tree:
                TableScan
                  alias: x
                  Filter Operator
                    predicate: (key < 20) (type: boolean)
                    Select Operator
                      expressions: key (type: string), value (type: string)
                      outputColumnNames: _col0, _col1
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col1}
                          1 {key} {value}
                        keys:
                          0 _col0 (type: string)
                          1 key (type: string)
                        outputColumnNames: _col1, _col2, _col3
                        input vertices:
                          1 Map 4
                        Select Operator
                          expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
                          outputColumnNames: _col0, _col1, _col2
                          File Output Operator
                            compressed: false
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                                name: default.dest_j1
            Local Work:
              Map Reduce Local Work
        Map 3 
            Map Operator Tree:
                TableScan
                  alias: x1
                  Filter Operator
                    predicate: (key > 100) (type: boolean)
                    Select Operator
                      expressions: key (type: string), value (type: string)
                      outputColumnNames: _col0, _col1
                      Map Join Operator
                        condition map:
                             Inner Join 0 to 1
                        condition expressions:
                          0 {_col1}
                          1 {key} {value}
                        keys:
                          0 _col0 (type: string)
                          1 key (type: string)
                        outputColumnNames: _col1, _col2, _col3
                        input vertices:
                          1 Map 4
                        Select Operator
                          expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
                          outputColumnNames: _col0, _col1, _col2
                          File Output Operator
                            compressed: false
                            table:
                                input format: org.apache.hadoop.mapred.TextInputFormat
                                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                                name: default.dest_j1
            Local Work:
              Map Reduce Local Work
        Union 2 
            Vertex: Union 2

  Stage: Stage-2
    Dependency Collection

  Stage: Stage-0
    Move Operator
      tables:
          replace: true
          table:
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.dest_j1

  Stage: Stage-3
    Stats-Aggr Operator

  Stage: Stage-5
    Spark
      DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:5
      Vertices:
        Map 4 
            Map Operator Tree:
                TableScan
                  alias: x
                  Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                  Filter Operator
                    predicate: key is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                    Spark HashTable Sink Operator
                      condition expressions:
                        0 {_col1}
                        1 {value}
                      keys:
                        0 _col0 (type: string)
                        1 key (type: string)
                    Reduce Output Operator
                      key expressions: key (type: string)
                      sort order: +
                      Map-reduce partition columns: key (type: string)
                      Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
                      value expressions: value (type: string)
            Local Work:
              Map Reduce Local Work

Time taken: 0.127 seconds, Fetched: 156 row(s)
Note that Stage-4 and Stage-5 are identical. Also, in Stage-4 there's a parallel RS operator with the HTS operator, which is strange.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java 4bfc26f 

Diff: https://reviews.apache.org/r/28307/diff/


Testing
-------


Thanks,

Chao Sun