You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/09 09:38:25 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue #298: Support window functions with empty `OVER` clause

Dandandan opened a new issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Window functions are a very valuable feature to have for DataFusion, as it allows to do analytical queries and things like deduplication to happen.
   
   **Describe the solution you'd like**
   Initial support for window functions with `OVER ()` clause. This allows us to gradually add more features, like support for `partition by` and `order by`.
   
   **Describe alternatives you've considered**
   A more complete implementation in one go. Window functions however are a big feature and we probably need quite some iterations to get it in shape.
   
   **Additional context**
   Some material:
   http://www.vldb.org/pvldb/vol8/p1058-leis.pdf&ved=2ahUKEwj80-3OjrfwAhUJPOwKHfdRAssQFjAMegQIEhAC&usg=AOvVaw2KKUPeYhyc-pEFTmlqyboj


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848809874


   - [ ] https://github.com/apache/arrow-datafusion/pull/375 to add window function support, streaming, and `row_number`
   - [ ] https://github.com/apache/arrow-datafusion/pull/403 `first_value`, `last_value`, and `nth_value`
   - [ ] https://github.com/apache/arrow-datafusion/pull/429 `lead` and `lag`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-840466714


   > I think a new operator might be the best to do here. They have very different semantics and needs.
   
   I agree that a new operator (along with a new kind of aggregate) is probably best here -- the way window aggregate functions are applied is different than "normal" aggregates -- among other things for a window function there is typically one row of output for each row of input whereas normal aggregates produce one row of output for each distinct value of grouping keys. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-844230310


   We'd like to support window function in three or more steps:
   1. #359 basic structure 
   2. #298 empty over clause (this one)
   3. #299 with partition clause
   4. #360 with order by
   5. #361 with window frame


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-840480051


   See https://github.com/apache/arrow-datafusion/pull/334 for a general idea of how much code change is needed if we'd add window function side by side with aggregates


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848809874


   - [x] https://github.com/apache/arrow-datafusion/pull/375 to add window function support, streaming, and `row_number`
   - [x] https://github.com/apache/arrow-datafusion/pull/403 `first_value`, `last_value`, and `nth_value`
   - [ ] https://github.com/apache/arrow-datafusion/pull/429 `lead` and `lag`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-840379271


   @Dandandan i wonder how window function shall fit into the logical planner? it shall be folded into aggregate functions? but the actual output schema might be different.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848809874


   - [x] https://github.com/apache/arrow-datafusion/pull/375 to add window function support, streaming, and `row_number`
   - [ ] https://github.com/apache/arrow-datafusion/pull/403 `first_value`, `last_value`, and `nth_value`
   - [ ] https://github.com/apache/arrow-datafusion/pull/429 `lead` and `lag`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-840442063


   @Jimexist I think a new operator might be the best to do here. They have very different semantics and needs.
   
   Between the operators there probably is still a lot of opportunity for reuse and in the planner we can use different operators (repartition, merge, sort, etc.)
   
   What do you think?
   
   @jorgecarleitao @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-844230310


   We'd like to support window function in three or more steps:
   1. #359 basic structure 
   2. #298 empty over clause
   3. #299 with partition clause
   4. #360 with order by
   5. #361 with window frame


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist commented on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848810095


   > * [ ]  #375 to add window function support, streaming, and `row_number`
   > * [ ]  #403 `first_value`, `last_value`, and `nth_value`
   > * [ ]  #429 `lead` and `lag`
   
   @alamb and @Dandandan  feel free to review in order


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
alamb closed issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist edited a comment on issue #298: Support window functions with empty `OVER` clause

Posted by GitBox <gi...@apache.org>.
Jimexist edited a comment on issue #298:
URL: https://github.com/apache/arrow-datafusion/issues/298#issuecomment-848809874


   - [x] https://github.com/apache/arrow-datafusion/pull/375 to add window function support, streaming, and `row_number`
   - [x] https://github.com/apache/arrow-datafusion/pull/403 `first_value`, `last_value`, and `nth_value`
   - [x] https://github.com/apache/arrow-datafusion/pull/429 `lead` and `lag`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org