You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/28 06:50:16 UTC

[GitHub] [arrow-datafusion] Dandandan commented on pull request #403: add `first_value`, `last_value`, and `nth_value` built-in window functions

Dandandan commented on pull request #403:
URL: https://github.com/apache/arrow-datafusion/pull/403#issuecomment-850188619


   > > Thank you @Jimexist  -- the  code structure looks clean to me . Nice work
   > > However, I am concerned about the correctness of these results. As I understand it, `first_value`, `last_value` and `nth_value` are not well defined unless there is an ordering on the window (aka without an ordering you basically can get some arbitrary value from the window).
   > > I wonder if it would make sense to implement ordering for windows first, so we can write tests will well defined output
   > > I also see some change to the `parquet-testing` module which I wonder if that was intended
   > 
   > I guess it's not arbitrary but rather just take the ordering as is. When #425 is merged I'll add one test case to compare with psql so the behavior is consistent.
   > 
   > Of course when ordering clause is implemented then the behavior can also be tested in the same way, along with some unit test.
   > 
   > I plan to implement ordering after this and the lead/lag PR are merged because it requires some structural changes to the planner.
   
   I think the implementation might take the order as is, however the order as given by the underlying plan might give different result. For example, a table scan might give the results in a different order each time it runs.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org