You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/26 13:17:52 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #87: [Rust] Add better and faster support for dictionary types

alamb opened a new issue #87:
URL: https://github.com/apache/arrow-datafusion/issues/87


   *Note*: migrated from original JIRA: https://issues.apache.org/jira/browse/ARROW-8464
   
   Usecases: Efficiently process large columns of low cardinality Strings
    
    * BatchIterator should accept both DictionaryBatch and RecordBatch
    * Type Coercion optimizer rule should inject expression for converting dictionary value types to index types (for equality expressions, and IN(values, ...)
    * Physical expression would lookup index for dictionary values referenced in the query so that at runtime, only indices are being compared per batch


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #87: [Rust] Add better and faster support for dictionary types

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #87:
URL: https://github.com/apache/arrow-datafusion/issues/87#issuecomment-826828067


   Comment from Andrew Lamb(alamb) @ 2020-10-06T12:25:16.639+0000:
   <pre>FYI [~andygrove] -- I am doing some part of this in ARROW-10159 -- however, the initial implementation effectively converts DictionaryArray --> PrimitiveArray / StringArray and then uses the existing processing.
   
   To support the actual efficient processing usecase, I think significant work will be needed to add appropriate dictionary support to the arrow compute kernels</pre>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org