You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by pengcheng xiong <px...@hortonworks.com> on 2017/06/01 19:46:37 UTC

Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/
-----------------------------------------------------------

(Updated June 1, 2017, 7:46 p.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
-------

hive-16775


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
  ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
  ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
  ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
  ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 


Diff: https://reviews.apache.org/r/59697/diff/2/

Changes: https://reviews.apache.org/r/59697/diff/1-2/


Testing
-------


Thanks,

pengcheng xiong


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On June 1, 2017, 9 p.m., pengcheng xiong wrote:
> > ql/src/test/results/clientpositive/filter_aggr.q.out
> > Lines 266 (patched)
> > <https://reviews.apache.org/r/59697/diff/2/?file=1738885#file1738885line266>
> >
> >     This shows null scan optimization is in.

yup.. but it doesnt show up for tpcds query 74 and query 4. I ran those with user.explain=false and explain extended to test.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176661
-----------------------------------------------------------


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176661
-----------------------------------------------------------




ql/src/test/results/clientpositive/filter_aggr.q.out
Lines 266 (patched)
<https://reviews.apache.org/r/59697/#comment250076>

    This shows null scan optimization is in.


- pengcheng xiong


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by Ashutosh Chauhan <ha...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176656
-----------------------------------------------------------



I tested latest and previous patch. On both cases, nullscan optimizer is not kicking in for tpcds queries. Is that expected?

- Ashutosh Chauhan


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by pengcheng xiong <px...@hortonworks.com>.

> On June 1, 2017, 9:08 p.m., pengcheng xiong wrote:
> > ql/src/test/results/clientpositive/perf/query4.q.out
> > Lines 436 (patched)
> > <https://reviews.apache.org/r/59697/diff/2/?file=1738886#file1738886line436>
> >
> >     The pattern TS-FIL(false) is here. I will take another look to see why it does not fire.

The reason is because of SharedScan. As all the branches in the union share the same 3 table scan, although only one of them will participate in the join, we still need to scan the table and then branch out. You will see lots of filters out of the same scan in the explain extended. If you turn off the shared scan, you will see the null scan optimization. This is interesting....


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176665
-----------------------------------------------------------


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by Jesús Camacho Rodríguez <jc...@hortonworks.com>.

> On June 1, 2017, 9:08 p.m., pengcheng xiong wrote:
> > ql/src/test/results/clientpositive/perf/query4.q.out
> > Lines 436 (patched)
> > <https://reviews.apache.org/r/59697/diff/2/?file=1738886#file1738886line436>
> >
> >     The pattern TS-FIL(false) is here. I will take another look to see why it does not fire.
> 
> pengcheng xiong wrote:
>     The reason is because of SharedScan. As all the branches in the union share the same 3 table scan, although only one of them will participate in the join, we still need to scan the table and then branch out. You will see lots of filters out of the same scan in the explain extended. If you turn off the shared scan, you will see the null scan optimization. This is interesting....

When SharedScan optimizer is executed, it seems we still do not have the information about the branches that will be removed. Is there a chance to extend NullScanOptimizer to remove those branches that will not contribute to result? Or is it even possible to move NullScanOptimizer to be executed before SharedScanOptimizer?


- Jesús


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176665
-----------------------------------------------------------


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by Ashutosh Chauhan <ha...@apache.org>.

> On June 1, 2017, 9:08 p.m., pengcheng xiong wrote:
> > ql/src/test/results/clientpositive/perf/query4.q.out
> > Lines 436 (patched)
> > <https://reviews.apache.org/r/59697/diff/2/?file=1738886#file1738886line436>
> >
> >     The pattern TS-FIL(false) is here. I will take another look to see why it does not fire.
> 
> pengcheng xiong wrote:
>     The reason is because of SharedScan. As all the branches in the union share the same 3 table scan, although only one of them will participate in the join, we still need to scan the table and then branch out. You will see lots of filters out of the same scan in the explain extended. If you turn off the shared scan, you will see the null scan optimization. This is interesting....
> 
> Jesús Camacho Rodríguez wrote:
>     When SharedScan optimizer is executed, it seems we still do not have the information about the branches that will be removed. Is there a chance to extend NullScanOptimizer to remove those branches that will not contribute to result? Or is it even possible to move NullScanOptimizer to be executed before SharedScanOptimizer?

Ideal thing to do here is to remove branches of tree corresponding to TS-FIL(false) But that removal is non-trivial in general case since it may make tree and result schema invalid e.g., in case of joins. Thats why NullScan instead of removing branch just replaces TS with ZeroRowInputFormat. We can remove them for union though which doesnt alter result schema and which is what HIVE-16797 is doing. Since that will happen early in logical planning, tree will be optimal by the time Shared Scan or NullScan sees it. So, I think with the rules we have, plan currently generated is optimal one we can generate for non-union cases like join like here. There is one minor extension which can be done is to enhance Shared Scan to simplify generated OR expression tree to drop false filter (and potentially other expression simplifications) after generating combining OR filter but that as I said is minor.


- Ashutosh


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176665
-----------------------------------------------------------


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by pengcheng xiong <px...@hortonworks.com>.

> On June 1, 2017, 9:08 p.m., pengcheng xiong wrote:
> > ql/src/test/results/clientpositive/perf/query4.q.out
> > Lines 436 (patched)
> > <https://reviews.apache.org/r/59697/diff/2/?file=1738886#file1738886line436>
> >
> >     The pattern TS-FIL(false) is here. I will take another look to see why it does not fire.
> 
> pengcheng xiong wrote:
>     The reason is because of SharedScan. As all the branches in the union share the same 3 table scan, although only one of them will participate in the join, we still need to scan the table and then branch out. You will see lots of filters out of the same scan in the explain extended. If you turn off the shared scan, you will see the null scan optimization. This is interesting....
> 
> Jesús Camacho Rodríguez wrote:
>     When SharedScan optimizer is executed, it seems we still do not have the information about the branches that will be removed. Is there a chance to extend NullScanOptimizer to remove those branches that will not contribute to result? Or is it even possible to move NullScanOptimizer to be executed before SharedScanOptimizer?
> 
> Ashutosh Chauhan wrote:
>     Ideal thing to do here is to remove branches of tree corresponding to TS-FIL(false) But that removal is non-trivial in general case since it may make tree and result schema invalid e.g., in case of joins. Thats why NullScan instead of removing branch just replaces TS with ZeroRowInputFormat. We can remove them for union though which doesnt alter result schema and which is what HIVE-16797 is doing. Since that will happen early in logical planning, tree will be optimal by the time Shared Scan or NullScan sees it. So, I think with the rules we have, plan currently generated is optimal one we can generate for non-union cases like join like here. There is one minor extension which can be done is to enhance Shared Scan to simplify generated OR expression tree to drop false filter (and potentially other expression simplifications) after generating combining OR filter but that as I said is minor.

Actually, did you guys think in this way? Shared scan is a special case of reusing of materialized view as this "materialized" view is truly materialized? :) I just would like to connect shared scan with jesus' work on materialized view. :)


- pengcheng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176665
-----------------------------------------------------------


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Re: Review Request 59697: Fix HiveFilterAggregateTransposeRule when filter is always false

Posted by pengcheng xiong <px...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/59697/#review176665
-----------------------------------------------------------




ql/src/test/results/clientpositive/perf/query4.q.out
Lines 436 (patched)
<https://reviews.apache.org/r/59697/#comment250080>

    The pattern TS-FIL(false) is here. I will take another look to see why it does not fire.


- pengcheng xiong


On June 1, 2017, 7:46 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/59697/
> -----------------------------------------------------------
> 
> (Updated June 1, 2017, 7:46 p.m.)
> 
> 
> Review request for hive and Ashutosh Chauhan.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> hive-16775
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/calcite/rules/HiveFilterAggregateTransposeRule.java 0e5c7313b6 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/CalcitePlanner.java 7f583ed075 
>   ql/src/test/queries/clientpositive/filter_aggr.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query4.q PRE-CREATION 
>   ql/src/test/queries/clientpositive/perf/query74.q PRE-CREATION 
>   ql/src/test/results/clientpositive/filter_aggr.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query4.q.out PRE-CREATION 
>   ql/src/test/results/clientpositive/perf/query74.q.out PRE-CREATION 
> 
> 
> Diff: https://reviews.apache.org/r/59697/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>