You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 21:50:02 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

alamb opened a new issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   It is very non obvious that either one or the other of `Accumulator::update` or `Accumulator::update_batch` are called. This results in unfortunate things like https://github.com/apache/arrow-datafusion/pull/1525 (implementation in terms of `ScalarValue`) only then to be followed by https://github.com/apache/arrow-datafusion/issues/1546 (implementation in terms of `Array`) 
   
   In addition, `update_batch` and `merge_batch` will always be the most performant, so they should always be what is implemented. 
   
   The same issue applies to `Accumulator::merge` and `Accumulator::merge_batch` 
   
   I tried making this clearer in  the docs via https://github.com/apache/arrow-datafusion/pull/1542
   
   **Describe the solution you'd like**
    I think a clearer thing to do would be to remove `update` and `merge` entirely from the traits and ensure the examples are clear to keep the barrier to entry as low as possible. 
   
   
   **Additional context**
   See additional context on https://github.com/apache/arrow-datafusion/pull/1547#pullrequestreview-849686382
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb edited a comment on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1012197926


   @liukun4515 
   
   > Can we change the default implementation of the accumulator trait to this?
   
   We could change the default implementation, however, I am not sure what purpose that would serve -- the code in datafusion that calls `Accumulator` only (should) use `Accumulator::update_batch` 
   
   I was thinking of moving the current default implementation of `Accumulator::update_batch` which calls `update()` with `ScalarValues` to some sort of public  example or adapter function, so anyone who had existing Accumulator implementations can continue to use them until they move over to `update_batch` and `merge_batch`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 edited a comment on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
liukun4515 edited a comment on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1010582587


   Can we change the default implementation of the accumulator trait to this?
   @alamb 
   
   ```
   pub trait Accumulator{
   
   fn update(&mut self, values: &[ScalarValue]) -> Result<()> {
      // convert the values to &[Arrayref]
      // call the update_batch method
   }
   fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()>;
   }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1012225389


   > 
   It makes sense to me.
   Removing the `update` and `merge` is more clear for developers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1012197926


   @liukun4515 
   
   > Can we change the default implementation of the accumulator trait to this?
   
   We could change the default implementation, however, I am not sure what purpose that would serve -- the code in datafusion that calls `Accumulator` only (should) use `Accumulator::update_batch` 
   
   I was thinking of moving the current default implementation of `Accumulator::update_batch` which calls `update()` with `ScalarValues` to some sort of public  example or adapter function, so anyone who had existing Accumulator implementations


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Jimexist closed issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
Jimexist closed issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] liukun4515 commented on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
liukun4515 commented on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1010582587


   Can we change the default implementation of the accumulator trait to this?
   @alamb 
   
   ```
   pub trait Accumulator{
   
   fn update(&mut self, values: &[ScalarValue]) -> Result<()> {
      // convert the values to &[Arrayref]
      // call the update_batch method
   }
   fn update_batch(&mut self, values: &[ArrayRef]) -> Result<()>;
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] realno commented on issue #1549: Proposal: Remove `Accumulator::update` and `Accumulator::merge`

Posted by GitBox <gi...@apache.org>.
realno commented on issue #1549:
URL: https://github.com/apache/arrow-datafusion/issues/1549#issuecomment-1010686703


   +1 on this proposal. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org