You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/18 11:05:24 UTC

[GitHub] [arrow] alamb commented on pull request #9233: ARROW-11289: [Rust][DataFusion] Implement GROUP BY support for Dictionary Encoded columns

alamb commented on pull request #9233:
URL: https://github.com/apache/arrow/pull/9233#issuecomment-762174671


   > If we are able to describe in the partitioning information that the partition is hashed by some column that is a dictionary, doesn't that allow us to perform very fast hashing (based on the dictionary indexes)?
   
   @jorgecarleitao  yes I think that would be a great optimization, or possibly skipping hashing entirely and build the aggregate table entirely on the dictionary indexes -- I suspect this would work well in the common case, but we would have to handle the case where the dictionary itself is not the same across all record batches (and thus indexes in one record batch may not correspond to the same value in another)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org