You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Szehon Ho <sz...@cloudera.com> on 2014/03/21 22:00:06 UTC

Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
-----------------------------------------------------------

Review request for hive.


Repository: hive-git


Description
-------

In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:

script --> filter (state=1)
           --> select, insert into test1
       -->filter (state=2)
           --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
         --> select, insert into test1
         --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by FilterPPD and that the ScriptPPD called the sequence "mergeWithChildrenPred /createFilters (pred)" which did the above transformation.  ScriptPPD was one of the few simple operator that did this, I tried with some other combination like extract (see my added test in transform_ppr2.q) and also just a select operator.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of select transform with grandchildren, etc.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
-------

Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho


Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

Posted by Szehon Ho <sz...@cloudera.com>.

> On March 21, 2014, 9:50 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 180
> > <https://reviews.apache.org/r/19549/diff/1/?file=531817#file531817line180>
> >
> >     Just for my understanding, for the given example, what's the filterOp, what's the parent, and what are the siblings?

Hi Xuefu, thanks for looking.  Like in my ascii diagram above, filter op is the (Filter).  The parent is the script operator.


- Szehon


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
-----------------------------------------------------------


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:
> 
> script --> filter (state=1)
>            --> select, insert into test1
>        -->filter (state=2)
>            --> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>          --> select, insert into test1
>          --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  "mergeWithChildrenPred + createFilters" which did the above transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some other parent operator like extract (see my added test in transform_ppr2.q) and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> -------
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

Posted by Xuefu Zhang <xz...@cloudera.com>.

> On March 21, 2014, 9:50 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java, line 180
> > <https://reviews.apache.org/r/19549/diff/1/?file=531817#file531817line180>
> >
> >     Just for my understanding, for the given example, what's the filterOp, what's the parent, and what are the siblings?
> 
> Szehon Ho wrote:
>     Hi Xuefu, thanks for looking.  Like in my ascii diagram above, filter op is the (Filter).  The parent is the script operator.

I guess "script" is the parent, based your comments.


- Xuefu


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
-----------------------------------------------------------


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:
> 
> script --> filter (state=1)
>            --> select, insert into test1
>        -->filter (state=2)
>            --> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>          --> select, insert into test1
>          --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  "mergeWithChildrenPred + createFilters" which did the above transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some other parent operator like extract (see my added test in transform_ppr2.q) and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> -------
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

Posted by Xuefu Zhang <xz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/#review38215
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
<https://reviews.apache.org/r/19549/#comment70234>

    Just for my understanding, for the given example, what's the filterOp, what's the parent, and what are the siblings?


- Xuefu Zhang


On March 21, 2014, 9:05 p.m., Szehon Ho wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/19549/
> -----------------------------------------------------------
> 
> (Updated March 21, 2014, 9:05 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:
> 
> script --> filter (state=1)
>            --> select, insert into test1
>        -->filter (state=2)
>            --> select, insert into test2
> 
> into:
> 
> script --> filter (state=1 and state=2)   //not possible.
>          --> select, insert into test1
>          --> select, insert into test2
> 
> 
> The bug was a combination of two things, first that these filters got chosen by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  "mergeWithChildrenPred + createFilters" which did the above transformation due to them being marked.  
> 
> ScriptPPD was one of the few simple operator that did this, I tried with some other parent operator like extract (see my added test in transform_ppr2.q) and also just a select operator and could not produce the issue with those.
> 
> The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of transform-operator with grandchildren, etc.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
>   ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
>   ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 
> 
> Diff: https://reviews.apache.org/r/19549/diff/
> 
> 
> Testing
> -------
> 
> Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.
> 
> 
> Thanks,
> 
> Szehon Ho
> 
>


Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
-----------------------------------------------------------

(Updated March 21, 2014, 9:05 p.m.)


Review request for hive.


Repository: hive-git


Description (updated)
-------

In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:

script --> filter (state=1)
           --> select, insert into test1
       -->filter (state=2)
           --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
         --> select, insert into test1
         --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  "mergeWithChildrenPred + createFilters" which did the above transformation due to them being marked.  

ScriptPPD was one of the few simple operator that did this, I tried with some other parent operator like extract (see my added test in transform_ppr2.q) and also just a select operator and could not produce the issue with those.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of transform-operator with grandchildren, etc.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
-------

Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho


Re: Review Request 19549: HIVE-6395 multi-table insert from select transform fails if optimize.ppd enabled

Posted by Szehon Ho <sz...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/19549/
-----------------------------------------------------------

(Updated March 21, 2014, 9:01 p.m.)


Review request for hive.


Changes
-------

Cleaned up comments.


Repository: hive-git


Description (updated)
-------

In this scenario, PPD on the script (transform) operator did the following wrong predicate pushdown:

script --> filter (state=1)
           --> select, insert into test1
       -->filter (state=2)
           --> select, insert into test2

into:

script --> filter (state=1 and state=2)   //not possible.
         --> select, insert into test1
         --> select, insert into test2


The bug was a combination of two things, first that these filters got chosen by FilterPPD as 'candidate' pushdown precdicates, and that the ScriptPPD called  "mergeWithChildrenPred + createFilters" which did the above transformation due to them being marked.  

ScriptPPD was one of the few simple operator that did this, I tried with some other combination like extract (see my added test in transform_ppr2.q) and also just a select operator and could not produce the issue with those.

The fix is to skip marking a predicate as a 'candidate' for the pushdown if it is a sibling of another filter.  We still want to pushdown children of select transform with grandchildren, etc.


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java 40298e1 
  ql/src/test/queries/clientpositive/transform_ppd_multi.q PRE-CREATION 
  ql/src/test/queries/clientpositive/transform_ppr2.q 85ef3ac 
  ql/src/test/results/clientpositive/transform_ppd_multi.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/transform_ppr2.q.out 4bddc69 

Diff: https://reviews.apache.org/r/19549/diff/


Testing
-------

Reproduced both the issue in transform_ppd_multi.q, also did another similar issue with an extract (cluster) operator in transform_pp2.q.  Ran other transform_ppd and general ppd tests to ensure no regression.


Thanks,

Szehon Ho