You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/28 01:35:41 UTC

[GitHub] [arrow-datafusion] Jimexist commented on pull request #403: add `first_value`, `last_value`, and `nth_value` built-in window functions

Jimexist commented on pull request #403:
URL: https://github.com/apache/arrow-datafusion/pull/403#issuecomment-850049189


   > Thank you @Jimexist  -- the  code structure looks clean to me . Nice work
   > 
   > 
   > 
   > However, I am concerned about the correctness of these results. As I understand it, `first_value`, `last_value` and `nth_value` are not well defined unless there is an ordering on the window (aka without an ordering you basically can get some arbitrary value from the window). 
   > 
   > 
   > 
   > I wonder if it would make sense to implement ordering for windows first, so we can write tests will well defined output
   > 
   > 
   > 
   > I also see some change to the `parquet-testing` module which I wonder if that was intended
   
   I guess it's not arbitrary but rather just take the ordering as is. When https://github.com/apache/arrow-datafusion/pull/425 is merged I'll add one test case to compare with psql so the behavior is consistent.
   
   Of course when ordering clause is implemented then the behavior can also be tested in the same way, along with some unit test.
   
   I plan to implement ordering after this and the lead/lag PR are merged because it requires some structural changes to the planner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org