You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chao (JIRA)" <ji...@apache.org> on 2014/11/19 02:31:33 UTC
[jira] [Commented] (HIVE-8908) Investigate test failure on join34.q

    [ https://issues.apache.org/jira/browse/HIVE-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14217227#comment-14217227 ] 

Chao commented on HIVE-8908:
----------------------------

The plan for this query look like below:

{noformat}
  TS_1    TS_2
  |       |
  FIL     FIL  TS_3
  |       |    |
  SEL     SEL  FIL
    \    /     |
     UNION     RS
        \     /
         \   /
          MJ
          |
          SEL
          |
          FS
{noformat}

The interesting part is that it has UNION operator followed by MJ operator. Currently, for each branch of the UNION operator, we include the whole path from the root TS operator to the bottom FS operator in its associated MapWork, which is illustrated in the above plan. This shouldn't happen. Instead, we should have a separate work for the MapJoinOperator.

Also because of the above, when it's processing small table work, it will clone RS to match the number of downstream works. I think once we resolved the above issue, the issue with the parallel HTS/RS will be gone.

> Investigate test failure on join34.q
> ------------------------------------
>
>                 Key: HIVE-8908
>                 URL: https://issues.apache.org/jira/browse/HIVE-8908
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>    Affects Versions: spark-branch
>            Reporter: Chao
>            Assignee: Chao
>
> For this query, the plan doesn't look correct:
> {noformat}
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-1 depends on stages: Stage-5, Stage-4
>   Stage-2 depends on stages: Stage-1
>   Stage-0 depends on stages: Stage-2
>   Stage-3 depends on stages: Stage-0
>   Stage-5 is a root stage
> STAGE PLANS:
>   Stage: Stage-4
>     Spark
>       DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:6
>       Vertices:
>         Map 4 
>             Map Operator Tree:
>                 TableScan
>                   alias: x
>                   Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                     Spark HashTable Sink Operator
>                       condition expressions:
>                         0 {_col1}
>                         1 {value}
>                       keys:
>                         0 _col0 (type: string)
>                         1 key (type: string)
>                     Reduce Output Operator
>                       key expressions: key (type: string)
>                       sort order: +
>                       Map-reduce partition columns: key (type: string)
>                       Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                       value expressions: value (type: string)
>             Local Work:
>               Map Reduce Local Work
>   Stage: Stage-1
>     Spark
>       Edges:
>         Union 2 <- Map 1 (NONE, 0), Map 3 (NONE, 0)
>       DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:4
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: x
>                   Filter Operator
>                     predicate: (key < 20) (type: boolean)
>                     Select Operator
>                       expressions: key (type: string), value (type: string)
>                       outputColumnNames: _col0, _col1
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col1}
>                           1 {key} {value}
>                         keys:
>                           0 _col0 (type: string)
>                           1 key (type: string)
>                         outputColumnNames: _col1, _col2, _col3
>                         input vertices:
>                           1 Map 4
>                         Select Operator
>                           expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
>                           outputColumnNames: _col0, _col1, _col2
>                           File Output Operator
>                             compressed: false
>                             table:
>                                 input format: org.apache.hadoop.mapred.TextInputFormat
>                                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                                 name: default.dest_j1
>             Local Work:
>               Map Reduce Local Work
>         Map 3 
>             Map Operator Tree:
>                 TableScan
>                   alias: x1
>                   Filter Operator
>                     predicate: (key > 100) (type: boolean)
>                     Select Operator
>                       expressions: key (type: string), value (type: string)
>                       outputColumnNames: _col0, _col1
>                       Map Join Operator
>                         condition map:
>                              Inner Join 0 to 1
>                         condition expressions:
>                           0 {_col1}
>                           1 {key} {value}
>                         keys:
>                           0 _col0 (type: string)
>                           1 key (type: string)
>                         outputColumnNames: _col1, _col2, _col3
>                         input vertices:
>                           1 Map 4
>                         Select Operator
>                           expressions: _col2 (type: string), _col3 (type: string), _col1 (type: string)
>                           outputColumnNames: _col0, _col1, _col2
>                           File Output Operator
>                             compressed: false
>                             table:
>                                 input format: org.apache.hadoop.mapred.TextInputFormat
>                                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                                 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>                                 name: default.dest_j1
>             Local Work:
>               Map Reduce Local Work
>         Union 2 
>             Vertex: Union 2
>   Stage: Stage-2
>     Dependency Collection
>   Stage: Stage-0
>     Move Operator
>       tables:
>           replace: true
>           table:
>               input format: org.apache.hadoop.mapred.TextInputFormat
>               output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>               serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>               name: default.dest_j1
>   Stage: Stage-3
>     Stats-Aggr Operator
>   Stage: Stage-5
>     Spark
>       DagName: chao_20141118150101_a47a2d7b-e750-4764-be66-5ba95ebbe433:5
>       Vertices:
>         Map 4 
>             Map Operator Tree:
>                 TableScan
>                   alias: x
>                   Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                   Filter Operator
>                     predicate: key is not null (type: boolean)
>                     Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                     Spark HashTable Sink Operator
>                       condition expressions:
>                         0 {_col1}
>                         1 {value}
>                       keys:
>                         0 _col0 (type: string)
>                         1 key (type: string)
>                     Reduce Output Operator
>                       key expressions: key (type: string)
>                       sort order: +
>                       Map-reduce partition columns: key (type: string)
>                       Statistics: Num rows: 1 Data size: 216 Basic stats: COMPLETE Column stats: NONE
>                       value expressions: value (type: string)
>             Local Work:
>               Map Reduce Local Work
> Time taken: 0.127 seconds, Fetched: 156 row(s)
> {noformat}
> Note that Stage-4 and Stage-5 are identical. Also, in Stage-4 there's a parallel RS operator with the HTS operator, which is strange.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)