You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/03/04 00:53:32 UTC

[GitHub] [arrow] westonpace commented on issue #34201: [R] Compute lagged or leading values

westonpace commented on issue #34201:
URL: https://github.com/apache/arrow/issues/34201#issuecomment-1454307957

   Yes, #32991 is a blocker.  After that we will still need some work to support window functions.  There was an effort to do this in #32813 but that was abandoned and probably more complex then we have the resources to implement:
   
   We will need to add a window node which (ignoring `over` for the moment) mostly boils down to defining a kernel for window functions.  A naive implementation would simply accumulate all of the data and pass it to the kernel all at once but I don't think this is practical.  I've been thinking about how we might do this in a more reasonable fashion and it shouldn't be nightmarish.
   
   Many window functions (e.g. cumulative sum, rank, percent_rank) simply require ordered input (and knowledge of the total # of rows) and don't "reach" ahead or behind.  So this would be very similar to `ExecuteScalarExpression` except the kernels would be executed in a serial fashion (easily doable once #32991 is done) and maintain a state after each call (e.g. the current cumulative sum).
   
   Sadly, lag & lead are rather peculiar window functions.  I believe, under the hood, it makes more sense to translate something like `lag(col) over ()` into `first_value(col) over (rows 1 preceding)`.  These behave the same except for the boundaries and we could finesse that.  That will require support for `over` (e.g. actual "windows" and not just ordered execution of scalar-ish functions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org