You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "simonvandel (via GitHub)" <gi...@apache.org> on 2023/06/17 14:24:30 UTC

[GitHub] [arrow-datafusion] simonvandel opened a new issue, #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data

simonvandel opened a new issue, #6710:
URL: https://github.com/apache/arrow-datafusion/issues/6710

   ### Describe the bug
   
   This query
   ```sql
   CREATE VIEW test(timestamp , value) AS VALUES(1,2),(5,6);
   
   EXPLAIN SELECT LAG(timestamp) OVER (ORDER BY timestamp) as prev, value FROM (SELECT * FROM test ORDER BY timestamp);
   ```
   gives
   
   ```
   DataFusion CLI v26.0.0
   0 rows in set. Query took 0.000 seconds.
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                                                                                                                                                                             |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: LAG(test.timestamp) ORDER BY [test.timestamp ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW AS prev, test.value                                                                                                                                                   |
   |               |   WindowAggr: windowExpr=[[LAG(test.timestamp) ORDER BY [test.timestamp ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW]]                                                                                                                                                      |
   |               |     Sort: test.timestamp ASC NULLS LAST                                                                                                                                                                                                                                                          |
   |               |       SubqueryAlias: test                                                                                                                                                                                                                                                                        |
   |               |         Projection: column1 AS timestamp, column2 AS value                                                                                                                                                                                                                                       |
   |               |           Values: (Int64(1), Int64(2)), (Int64(5), Int64(6))                                                                                                                                                                                                                                     |
   | physical_plan | ProjectionExec: expr=[LAG(test.timestamp) ORDER BY [test.timestamp ASC NULLS LAST] RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW@2 as prev, value@1 as value]                                                                                                                                |
   |               |   BoundedWindowAggExec: wdw=[LAG(test.timestamp): Ok(Field { name: "LAG(test.timestamp)", data_type: Int64, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }), frame: WindowFrame { units: Range, start_bound: Preceding(Int64(NULL)), end_bound: CurrentRow }], mode=[Sorted] |
   |               |     SortExec: expr=[timestamp@0 ASC NULLS LAST]                                                                                                                                                                                                                                                  |
   |               |       ProjectionExec: expr=[column1@0 as timestamp, column2@1 as value]                                                                                                                                                                                                                          |
   |               |         ValuesExec                                                                                                                                                                                                                                                                               |
   |               |                                                                                                                                                                                                                                                                                                  |
   +---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   ```
   
   ### To Reproduce
   
   _No response_
   
   ### Expected behavior
   
   I would expect that the `ORDER BY` in the window frame could be eliminated since the `SortExec` upstream enforces the same ordering.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] simonvandel commented on issue #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel commented on issue #6710:
URL: https://github.com/apache/arrow-datafusion/issues/6710#issuecomment-1596004358

   I may be confused by the ORDER BY on the ProjectionExec.
   
   The query is a reduced case of a bigger query where the data source is marked already sorted. But in that bigger query, I could see profiling stacks related to sorting, which is unexpected. I'll see if I make a small reproduction with a flamegraph.
   
   In any case, would it be unreasonable to remove the ORDER BY in the query plan if there is in fact no sorting going on? I think that would fit with the same model that a SortExec is also removed from a plan if data is already sorted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] simonvandel commented on issue #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel commented on issue #6710:
URL: https://github.com/apache/arrow-datafusion/issues/6710#issuecomment-1596098342

   I tried creating a reproducer program, but actually couldn't get it to reproduce. There is no sorting going on, so I must have made an error somewhere.
   Sorry about this, closing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] simonvandel closed issue #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data

Posted by "simonvandel (via GitHub)" <gi...@apache.org>.
simonvandel closed issue #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data
URL: https://github.com/apache/arrow-datafusion/issues/6710


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on issue #6710: `BoundedWindowAggExec`/`ProjectionExec` ORDER BY not removed on sorted data

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on issue #6710:
URL: https://github.com/apache/arrow-datafusion/issues/6710#issuecomment-1596001865

   What is the issue here? I see only one `SortExec`, no redundant sorting? Maybe you are confused by the ORDER BY annotation in the `ProjectionExec` printout? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org