You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Aman Sinha <as...@maprtech.com> on 2015/06/18 01:59:13 UTC

Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/
-----------------------------------------------------------

Review request for drill and Jinfeng Ni.


Bugs: DRILL-3298
    https://issues.apache.org/jira/browse/DRILL-3298


Repository: drill-git


Description
-------

The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 

Diff: https://reviews.apache.org/r/35584/diff/


Testing
-------

Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.


Thanks,

Aman Sinha


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88432
-----------------------------------------------------------

Ship it!


Ship It!

- Jinfeng Ni


On June 18, 2015, 12:07 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 18, 2015, 12:07 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
>   exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 2ec2481 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/
-----------------------------------------------------------

(Updated June 18, 2015, 7:07 p.m.)


Review request for drill and Jinfeng Ni.


Changes
-------

New patch after incorporating review comments and adding unit test.


Bugs: DRILL-3298
    https://issues.apache.org/jira/browse/DRILL-3298


Repository: drill-git


Description
-------

The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
  exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 2ec2481 

Diff: https://reviews.apache.org/r/35584/diff/


Testing
-------

Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.


Thanks,

Aman Sinha


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Aman Sinha <as...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88333
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 64)
<https://reviews.apache.org/r/35584/#comment140805>

    Currently your query will fail with an unsupported operation error: 
    
    Error: UNSUPPORTED_OPERATION ERROR: Multiple window definitions in a single SELECT list is not currently supported
    See Apache Drill JIRA: DRILL-3196
    
    I am not sure if resetting the traits in the loop would cause incorrect results; it would be suboptimal. There is a comment at the beginning: 
          // TODO: Order window based on existing partition by
        //input.getTraitSet().subsumes()



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 99)
<https://reviews.apache.org/r/35584/#comment140806>

    Yes, I will use window.collation() to be consistent.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 100)
<https://reviews.apache.org/r/35584/#comment140807>

    Agree..the convert() is not necessary in this case. Will remove and assign convertedInput = SingleMergeExchange.


- Aman Sinha


On June 17, 2015, 11:59 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 17, 2015, 11:59 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Aman Sinha <as...@maprtech.com>.

> On June 18, 2015, 1:11 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java, line 64
> > <https://reviews.apache.org/r/35584/diff/1/?file=986627#file986627line64>
> >
> >     This might not be related to this JIRA. But I feel this "for" loop is problemtic.  Are we going to support the following:
> >     
> >     avg(c_1) over (partition by c_2 order by c_3), sum(c_2) over (partition by c_4 order by c_5)
> >     
> >     Seems each different group will re-set the traits based on different "partition"/"order" keys, which would cause incorrect problem.
> >     
> >     If we do not support, then we probably should add assertion check to make sure there is only one group in window.

Currently your query will fail with an unsupported operation error: 

Error: UNSUPPORTED_OPERATION ERROR: Multiple window definitions in a single SELECT list is not currently supported
See Apache Drill JIRA: DRILL-3196

I am not sure if resetting the traits in the loop would cause incorrect results; it would be suboptimal. There is a comment at the beginning: 
      // TODO: Order window based on existing partition by
    //input.getTraitSet().subsumes()


> On June 18, 2015, 1:11 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java, line 99
> > <https://reviews.apache.org/r/35584/diff/1/?file=986627#file986627line99>
> >
> >     Line 99 uses windowBase.orderKeys, while line 181 uses window.collation(). It took me a while to figure out they refer to the same thing. Is it better to use one form, to avoid confusion?

Yes, I will use window.collation() to be consistent.


> On June 18, 2015, 1:11 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java, line 100
> > <https://reviews.apache.org/r/35584/diff/1/?file=986627#file986627line100>
> >
> >     Why do u need call convert() over exch? the input has been converted, and the SingleMergeExchangePrel is explicitly created in this rule. Sounds it's not necessary to call convert() again.

Agree..the convert() is not necessary in this case. Will remove and assign convertedInput = SingleMergeExchange.


On June 18, 2015, 1:11 a.m., Aman Sinha wrote:
> > Also, you may consider adding one unit test case for the failed query reported in the JIRA.

I am trying to create a simpler repro than the one in the JIRA (that one has many parquet files) and will upload a new patch with a unit test and above changes.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88321
-----------------------------------------------------------


On June 17, 2015, 11:59 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 17, 2015, 11:59 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Jinfeng Ni <jn...@maprtech.com>.

> On June 17, 2015, 6:11 p.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java, line 64
> > <https://reviews.apache.org/r/35584/diff/1/?file=986627#file986627line64>
> >
> >     This might not be related to this JIRA. But I feel this "for" loop is problemtic.  Are we going to support the following:
> >     
> >     avg(c_1) over (partition by c_2 order by c_3), sum(c_2) over (partition by c_4 order by c_5)
> >     
> >     Seems each different group will re-set the traits based on different "partition"/"order" keys, which would cause incorrect problem.
> >     
> >     If we do not support, then we probably should add assertion check to make sure there is only one group in window.
> 
> Aman Sinha wrote:
>     Currently your query will fail with an unsupported operation error: 
>     
>     Error: UNSUPPORTED_OPERATION ERROR: Multiple window definitions in a single SELECT list is not currently supported
>     See Apache Drill JIRA: DRILL-3196
>     
>     I am not sure if resetting the traits in the loop would cause incorrect results; it would be suboptimal. There is a comment at the beginning: 
>           // TODO: Order window based on existing partition by
>         //input.getTraitSet().subsumes()

Right. The UNSUPPORTED_OPERATION ERROR is what I expect. If we do not allow multiple window definitions, then seems this for loop in WindowPrule is not necessary. In stead, we could simply check if window.groups has size = 1. Something like:

    Preconditions.checkArgument(window.groups.size() == 1, "Only one window definition is allowed.");

    Window.Group windowBase = window.groups.get(0);


- Jinfeng


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88321
-----------------------------------------------------------


On June 17, 2015, 4:59 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 17, 2015, 4:59 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Aman Sinha <as...@maprtech.com>.

> On June 18, 2015, 1:11 a.m., Jinfeng Ni wrote:
> > exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java, line 64
> > <https://reviews.apache.org/r/35584/diff/1/?file=986627#file986627line64>
> >
> >     This might not be related to this JIRA. But I feel this "for" loop is problemtic.  Are we going to support the following:
> >     
> >     avg(c_1) over (partition by c_2 order by c_3), sum(c_2) over (partition by c_4 order by c_5)
> >     
> >     Seems each different group will re-set the traits based on different "partition"/"order" keys, which would cause incorrect problem.
> >     
> >     If we do not support, then we probably should add assertion check to make sure there is only one group in window.
> 
> Aman Sinha wrote:
>     Currently your query will fail with an unsupported operation error: 
>     
>     Error: UNSUPPORTED_OPERATION ERROR: Multiple window definitions in a single SELECT list is not currently supported
>     See Apache Drill JIRA: DRILL-3196
>     
>     I am not sure if resetting the traits in the loop would cause incorrect results; it would be suboptimal. There is a comment at the beginning: 
>           // TODO: Order window based on existing partition by
>         //input.getTraitSet().subsumes()
> 
> Jinfeng Ni wrote:
>     Right. The UNSUPPORTED_OPERATION ERROR is what I expect. If we do not allow multiple window definitions, then seems this for loop in WindowPrule is not necessary. In stead, we could simply check if window.groups has size = 1. Something like:
>     
>         Preconditions.checkArgument(window.groups.size() == 1, "Only one window definition is allowed.");
>     
>         Window.Group windowBase = window.groups.get(0);

I discussed this offline with Jinfeng and we agreed that the current logic in the for loop will produce valid plans.  In fact, please see a discussion of this in DRILL-3196 where both the plan and results were validated for multiple window partitions.  The reason we are currently blocking this functionality is because of lack of sufficient testing, not due to implementation.


- Aman


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88321
-----------------------------------------------------------


On June 18, 2015, 7:07 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 18, 2015, 7:07 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
>   exec/java-exec/src/test/java/org/apache/drill/exec/TestWindowFunctions.java 2ec2481 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>


Re: Review Request 35584: DRILL-3298: fix wrong result for window function query when no partition-by is present

Posted by Jinfeng Ni <jn...@maprtech.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/35584/#review88321
-----------------------------------------------------------



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 64)
<https://reviews.apache.org/r/35584/#comment140783>

    This might not be related to this JIRA. But I feel this "for" loop is problemtic.  Are we going to support the following:
    
    avg(c_1) over (partition by c_2 order by c_3), sum(c_2) over (partition by c_4 order by c_5)
    
    Seems each different group will re-set the traits based on different "partition"/"order" keys, which would cause incorrect problem.
    
    If we do not support, then we probably should add assertion check to make sure there is only one group in window.



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 99)
<https://reviews.apache.org/r/35584/#comment140778>

    Line 99 uses windowBase.orderKeys, while line 181 uses window.collation(). It took me a while to figure out they refer to the same thing. Is it better to use one form, to avoid confusion?



exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java (line 100)
<https://reviews.apache.org/r/35584/#comment140780>

    Why do u need call convert() over exch? the input has been converted, and the SingleMergeExchangePrel is explicitly created in this rule. Sounds it's not necessary to call convert() again.


Also, you may consider adding one unit test case for the failed query reported in the JIRA.

- Jinfeng Ni


On June 17, 2015, 4:59 p.m., Aman Sinha wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/35584/
> -----------------------------------------------------------
> 
> (Updated June 17, 2015, 4:59 p.m.)
> 
> 
> Review request for drill and Jinfeng Ni.
> 
> 
> Bugs: DRILL-3298
>     https://issues.apache.org/jira/browse/DRILL-3298
> 
> 
> Repository: drill-git
> 
> 
> Description
> -------
> 
> The JIRA DRILL-3298 has relevant discussion on this.  The fix involves creating a single stream as input to the Window by inserting a SingleMergeExchange if only the ORDER-BY clause is present in a Window.  This ensures that the Sort below the Window is done in parallel followed by a Merge.
> 
> 
> Diffs
> -----
> 
>   exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/WindowPrule.java f7728c8 
> 
> Diff: https://reviews.apache.org/r/35584/diff/
> 
> 
> Testing
> -------
> 
> Manual testing of the query in DRILL-3298.  Ran unit tests.  Functional tests are in progress.
> 
> 
> Thanks,
> 
> Aman Sinha
> 
>