You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/08 20:04:32 UTC

[GitHub] [spark] bersprockets opened a new pull request, #37443: [SPARK-40002][SQL] Don't push down limit through window using ntile

bersprockets opened a new pull request, #37443:
URL: https://github.com/apache/spark/pull/37443

   ### What changes were proposed in this pull request?
   
   Change `LimitPushDownThroughWindow` so that it no longer supports pushing down a limit through a window using ntile.
   
   ### Why are the changes needed?
   
   In an unpartitioned window, the ntile function is currently applied to the result of the limit. This behavior produces results that conflict with Spark 3.1.3, Hive 2.3.9 and Prestodb 0.268
   
   #### Example
   
   Assume this data:
   ```
   create table t1 stored as parquet as
   select *
   from range(101);
   ```
   Also assume this query:
   ```
   select id, ntile(10) over (order by id) as nt
   from t1
   limit 10;
   ```
   With Spark 3.2.2, Spark 3.3.0, and master, the limit is applied before the ntile function:
   ```
   +---+---+
   |id |nt |
   +---+---+
   |0  |1  |
   |1  |2  |
   |2  |3  |
   |3  |4  |
   |4  |5  |
   |5  |6  |
   |6  |7  |
   |7  |8  |
   |8  |9  |
   |9  |10 |
   +---+---+
   ```
   With Spark 3.1.3, and Hive 2.3.9, and Prestodb 0.268, the limit is applied _after_ ntile.
   
   Spark 3.1.3:
   ```
   +---+---+
   |id |nt |
   +---+---+
   |0  |1  |
   |1  |1  |
   |2  |1  |
   |3  |1  |
   |4  |1  |
   |5  |1  |
   |6  |1  |
   |7  |1  |
   |8  |1  |
   |9  |1  |
   +---+---+
   ```
   Hive 2.3.9:
   ```
   +-----+-----+
   | id  | nt  |
   +-----+-----+
   | 0   | 1   |
   | 1   | 1   |
   | 2   | 1   |
   | 3   | 1   |
   | 4   | 1   |
   | 5   | 1   |
   | 6   | 1   |
   | 7   | 1   |
   | 8   | 1   |
   | 9   | 1   |
   +-----+-----+
   10 rows selected (1.72 seconds)
   ```
   Prestodb 0.268:
   ```
    id | nt 
   ----+----
     0 |  1 
     1 |  1 
     2 |  1 
     3 |  1 
     4 |  1 
     5 |  1 
     6 |  1 
     7 |  1 
     8 |  1 
     9 |  1 
   (10 rows)
   
   ```
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Two new unit tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon closed pull request #37443: [SPARK-40002][SQL] Don't push down limit through window using ntile

Posted by GitBox <gi...@apache.org>.
HyukjinKwon closed pull request #37443: [SPARK-40002][SQL] Don't push down limit through window using ntile
URL: https://github.com/apache/spark/pull/37443


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] bersprockets commented on pull request #37443: [SPARK-40002][SQL] Don't push down limit through window using ntile

Posted by GitBox <gi...@apache.org>.
bersprockets commented on PR #37443:
URL: https://github.com/apache/spark/pull/37443#issuecomment-1208557863

   @wangyum 
   
   I should have fixed this when I was fixing the same issue with `percent_rank` (SPARK-38614), but I missed it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #37443: [SPARK-40002][SQL] Don't push down limit through window using ntile

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on PR #37443:
URL: https://github.com/apache/spark/pull/37443#issuecomment-1208847302

   Merged to master, branch-3.3, and branch-3.2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org