You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/02/02 11:51:00 UTC
[jira] [Commented] (ARROW-11410) [Rust][Parquet] Implement
returning dictionary arrays from parquet reader
[ https://issues.apache.org/jira/browse/ARROW-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277063#comment-17277063 ]
Andrew Lamb commented on ARROW-11410:
-------------------------------------
[~yordan-pavlov] I think this would be amazing -- and we would definitely use it in IOx. This is the kind of thing that is on our longer term roadmap and I would love to help (e.g. code review, or testing , or documentation, etc).
Let me know!
> [Rust][Parquet] Implement returning dictionary arrays from parquet reader
> -------------------------------------------------------------------------
>
> Key: ARROW-11410
> URL: https://issues.apache.org/jira/browse/ARROW-11410
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Yordan Pavlov
> Priority: Major
>
> Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
> If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
> * faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
> * more efficient memory use as the dictionary array would use less memory when loaded in memory
> * faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
> [~nevime] , [~alamb] let me know what you think
--
This message was sent by Atlassian Jira
(v8.3.4#803005)