You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "viirya (via GitHub)" <gi...@apache.org> on 2023/06/28 15:48:58 UTC

[GitHub] [arrow-datafusion] viirya commented on pull request #6785: Coerce dictionaries for arithmetic

viirya commented on PR #6785:
URL: https://github.com/apache/arrow-datafusion/pull/6785#issuecomment-1611682160

   >> As an aside it is unclear why you would ever use a primitive dictionary, they will almost always be slower to process and especially once you factor in dictionary sparsity, may be significantly larger
   > I think @viirya added this feature so perhaps he has more context
   
   Hmm, I don't get the question. For the dictionary behavior, it follows original behavior. It basically follows kernel behavior too.  Previously I cleaned up the decimal type part, but for dictionary behavior, I think it is unchanged.
   
   I think for dictionary, it still has storage advantage in memory and disk (e.g. shuffle), otherwise it has no reason to have it. It might be true that dictionary is slower to process in computation kernel, but for overall cost I'm not sure it is always a win to get rid of dictionary. Except we have something that can automatically detect dictionary sparsity and convert to dictionary before outputting to storage.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org