You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/28 17:42:02 UTC

[GitHub] [arrow-rs] jhorstmann commented on issue #504: Do not copy dictionary values when they are the same in `concat`

jhorstmann commented on issue #504:
URL: https://github.com/apache/arrow-rs/issues/504#issuecomment-869882802


   FYI, in our product we use our own concat kernel for dictionary arrays because our TableProvider ensures that dictionaries across chunks and batches are "consistent". Dictionaries across chunks are not necessarily equal, but our loader makes sure that all values from the previous dictionary occur with the same key in later batches. We will likely make that guarantee stronger soon so that it will actually be the same dictionary, the proposed change would then allow us to use the standard kernel. I think when we switched to our own kernel, dictionaries were not supported at all by concat.
   
   I never got around to proposing this, but my initial idea for a solution was to have such a guarantee somehow on metadata level, as  it would also allow more optimizations for example in the group by operation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org