You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/06/21 02:10:00 UTC

[jira] [Comment Edited] (HIVE-11297) Combine op trees for partition info generating tasks [Spark branch]

    [ https://issues.apache.org/jira/browse/HIVE-11297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16056837#comment-16056837 ] 

liyunzhang_intel edited comment on HIVE-11297 at 6/21/17 2:09 AM:
------------------------------------------------------------------

[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted above, i print the operator tree  

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}

So can you retest it in your env? if the operator tree is like what you mentioned, i think all the operator tree in spark_dynamic_partition_pruning.q.out will be different as i generated in my env.



was (Author: kellyzly):
[~csun]:   I patch HIVE-11297.6.patch on latest master branch(8c5f55e) and run query i posted above, i print the operator tree of filterOp 

SplitOpTreeForDPP#process
{code}
.....
/** print the operator tree **/
  ArrayList<TableScanOperator> tableScanList = new ArrayList ();
 tableScanList.add((TableScanOperator)stack.get(0));
 LOG.debug("operator tree:"+Operator.toString(tableScanList));
/** print the operator tree**/
Operator<?> filterOp = pruningSinkOp;
    while (filterOp != null) {
      if (filterOp.getNumChild() > 1) {
        break;
      } else {
        filterOp = filterOp.getParentOperators().get(0);
      }
    }
....

{code}

the operator tree is:
{code}
TS[1]-FIL[17]-RS[4]-JOIN[5]-GBY[8]-RS[9]-GBY[10]-FS[12]
TS[1]-FIL[17]-SEL[18]-GBY[19]-SPARKPRUNINGSINK[20]
TS[1]-FIL[17]-SEL[21]-GBY[22]-SPARKPRUNINGSINK[23]
{code}


> Combine op trees for partition info generating tasks [Spark branch]
> -------------------------------------------------------------------
>
>                 Key: HIVE-11297
>                 URL: https://issues.apache.org/jira/browse/HIVE-11297
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: spark-branch
>            Reporter: Chao Sun
>            Assignee: liyunzhang_intel
>         Attachments: HIVE-11297.1.patch, HIVE-11297.2.patch, HIVE-11297.3.patch, HIVE-11297.4.patch, HIVE-11297.5.patch, HIVE-11297.6.patch
>
>
> Currently, for dynamic partition pruning in Spark, if a small table generates partition info for more than one partition columns, multiple operator trees are created, which all start from the same table scan op, but have different spark partition pruning sinks.
> As an optimization, we can combine these op trees and so don't have to do table scan multiple times.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)