You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/28 18:12:25 UTC

[GitHub] [arrow-datafusion] Dandandan commented on issue #790: Rework GroupByHash to support grouping by nulls

Dandandan commented on issue #790:
URL: https://github.com/apache/arrow-datafusion/issues/790#issuecomment-888516528


   Yes, this (great) explanation / overview matches my suggestion. Somehow the ASCII art doesn't render well here?
   
   A couple of points:
   * Great idea to use a `Vec<ScalarValue>` first. It seems it should be quite easy to swap that to a typed alternative later, and probably involves less changes now
   * To convert the group by values it can re-use the `create_hashes` function from the hash join. This way it will also benefit from improvements there and is calculated on (multiple) Arrays directly.
   
   * Before implementing this proposal it is I think be possible with minimal changes to also to include a null part in the hashmap key, so the key would become something like `(bool, Vec<u8>)` as hashmap key. A null key would have `(False, vec![])` as value, an actual value would contain a True and the bytes of the contents.
   That will be a bit slower and consume more memory, but at least allows to group on nulls too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org