You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Sahil Takiar (JIRA)" <ji...@apache.org> on 2017/04/21 22:11:04 UTC
[jira] [Created] (HIVE-16507) Hive Explain User-Level may print out "Vertex dependency in root stage"

Sahil Takiar created HIVE-16507:
-----------------------------------

             Summary: Hive Explain User-Level may print out "Vertex dependency in root stage"
                 Key: HIVE-16507
                 URL: https://issues.apache.org/jira/browse/HIVE-16507
             Project: Hive
          Issue Type: Bug
            Reporter: Sahil Takiar
            Assignee: Sahil Takiar


User-level explain plans have a section titled {{Vertex dependency in root stage}} - which (according to the name) prints out the dependencies between all vertices that are in the root stage.

This logic is controlled by {{DagJsonParser#print}} and it may print out {{Vertex dependency in root stage}} twice.

The logic in this method first extracts all stages and plans. It then iterates over all the stages, and if the stage contains any edges, it prints them out.

If we want to be consistent with the statement {{Vertex dependency in root stage}} then we should add a check to see if the stage we are processing during the iteration is the root stage or not.

Alternatively, we could print out the edges for each stage and change the line from {{Vertex dependency in root stage}} to {{Vertex dependency in [stage-id]}}

I'm not sure if its possible for Hive-on-Tez to create a plan with a non-root stage that contains edges, but it is possible for Hive-on-Spark (support added for HoS in HIVE-11133).

Example for HoS:

{code}
set hive.optimize.ppd=true;
set hive.ppd.remove.duplicatefilters=true;
set hive.spark.dynamic.partition.pruning=true;
set hive.optimize.metadataonly=false;
set hive.optimize.index.filter=true;
set hive.strict.checks.cartesian.product=false;
set hive.spark.explain.user=true;
set hive.spark.dynamic.partition.pruning=true;

EXPLAIN select count(*) from srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
{code}

Prints

{code}
Plan optimized by CBO.

Vertex dependency in root stage
Reducer 10 <- Map 9 (GROUP)
Reducer 11 <- Reducer 10 (GROUP), Reducer 13 (GROUP)
Reducer 13 <- Map 12 (GROUP)

Vertex dependency in root stage
Reducer 2 <- Map 1 (PARTITION-LEVEL SORT), Reducer 6 (PARTITION-LEVEL SORT)
Reducer 3 <- Reducer 2 (GROUP)
Reducer 5 <- Map 4 (GROUP)
Reducer 6 <- Reducer 5 (GROUP), Reducer 8 (GROUP)
Reducer 8 <- Map 7 (GROUP)

Stage-0
  Fetch Operator
    limit:-1
    Stage-1
      Reducer 3
      File Output Operator [FS_34]
        Group By Operator [GBY_32] (rows=1 width=8)
          Output:["_col0"],aggregations:["count(VALUE._col0)"]
        <-Reducer 2 [GROUP]
          GROUP [RS_31]
            Group By Operator [GBY_30] (rows=1 width=8)
              Output:["_col0"],aggregations:["count()"]
              Join Operator [JOIN_28] (rows=2200 width=10)
                condition map:[{"":"{\"type\":\"Inner\",\"left\":0,\"right\":1}"}],keys:{"0":"_col0","1":"_col0"}
              <-Map 1 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_26]
                  PartitionCols:_col0
                  Select Operator [SEL_2] (rows=2000 width=10)
                    Output:["_col0"]
                    TableScan [TS_0] (rows=2000 width=10)
                      default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
              <-Reducer 6 [PARTITION-LEVEL SORT]
                PARTITION-LEVEL SORT [RS_27]
                  PartitionCols:_col0
                  Group By Operator [GBY_24] (rows=1 width=184)
                    Output:["_col0"],keys:KEY._col0
                  <-Reducer 5 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_9] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_7] (rows=1 width=184)
                            Output:["_col0"],aggregations:["max(VALUE._col0)"]
                          <-Map 4 [GROUP]
                            GROUP [RS_6]
                              Group By Operator [GBY_5] (rows=1 width=184)
                                Output:["_col0"],aggregations:["max(ds)"]
                                Select Operator [SEL_4] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_3] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
                  <-Reducer 8 [GROUP]
                    GROUP [RS_23]
                      PartitionCols:_col0
                      Group By Operator [GBY_22] (rows=2 width=184)
                        Output:["_col0"],keys:_col0
                        Filter Operator [FIL_17] (rows=1 width=184)
                          predicate:_col0 is not null
                          Group By Operator [GBY_15] (rows=1 width=184)
                            Output:["_col0"],aggregations:["min(VALUE._col0)"]
                          <-Map 7 [GROUP]
                            GROUP [RS_14]
                              Group By Operator [GBY_13] (rows=1 width=184)
                                Output:["_col0"],aggregations:["min(ds)"]
                                Select Operator [SEL_12] (rows=2000 width=10)
                                  Output:["ds"]
                                  TableScan [TS_11] (rows=2000 width=10)
                                    default@srcpart,srcpart,Tbl:COMPLETE,Col:NONE
        Stage-2
          Reducer 11
{code}

So there are two sections that say {{Vertex dependency in root stage}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)