You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/05 00:28:49 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

alamb opened a new issue, #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   There is casting logic in aggregates that handles coercing inputs to aggregates
   
   https://github.com/apache/arrow-datafusion/blob/6b4bbd0/datafusion/physical-expr/src/aggregate/sum.rs#L224-L316
   
   On the surface doing these casts in `sum.rs` appears to duplicates some non trivial amount of the logic in plan timecoercion  -- maybe it would be possible to make this code cleaner / consolidate more of the coercion logic.
   
   
   **Describe the solution you'd like**
   Ensure types are known prior to executing the aggregate so that the input and aggregate types are known aprior
   
   **Describe alternatives you've considered**
   Not sure (maybe the code is needed, it just "feels" a bit wrong)
   
   **Additional context**
   
   https://github.com/apache/arrow-datafusion/pull/2405#discussion_r864400327
   cc @WinkerDu 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1120400563

   the same pattern used in `min_max.rs`.
   The common approach to get the underlying value from ScalarValue is through patmatch, and that causes a lot of similar code branches to operate with underlying values supporting different datatypes.
   
   I'm trying to figure out if `sum_batch` function can be reused instead of boilerplate `sum`. In fact `sum` is specific case of `sum_batch`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1586295234

   I think this I think this ticket is no longer tracking anything actionable -- I expect more performance improvements from https://github.com/apache/arrow-datafusion/issues/4973 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1120411751

   I wonder if we can take inspiration from @yjshen 's approach in https://github.com/apache/arrow-datafusion/pull/2375 (using row formats, etc). I don't have any specific suggestions but thought it might be interesting to look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1119391942

   @alamb I can take this if you dont mind


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1120322644

   I added this to https://github.com/apache/arrow-datafusion/issues/2355


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] comphead commented on issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by GitBox <gi...@apache.org>.
comphead commented on issue #2447:
URL: https://github.com/apache/arrow-datafusion/issues/2447#issuecomment-1123910091

   @alamb sorry for delay, I stuck investigating all that datatypes. 
   Please check the draft #2516
   So idea is to reuse `sum_batch` function instead of boilerplate `sum`.
   I have to do type casting to ensure all types are the same. it might be looking a concern for the performance. Please let me know your thoughts


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #2447: Investigate and reduce runtime type coercion in aggregates like `sum`
URL: https://github.com/apache/arrow-datafusion/issues/2447


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org