You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:47:02 UTC
[jira] [Commented] (ARROW-11410) [Rust][Parquet] Implement
returning dictionary arrays from parquet reader
[ https://issues.apache.org/jira/browse/ARROW-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332201#comment-17332201 ]
Andrew Lamb commented on ARROW-11410:
-------------------------------------
Migrated to github: https://github.com/apache/arrow-rs/issues/171
> [Rust][Parquet] Implement returning dictionary arrays from parquet reader
> -------------------------------------------------------------------------
>
> Key: ARROW-11410
> URL: https://issues.apache.org/jira/browse/ARROW-11410
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Yordan Pavlov
> Priority: Major
>
> Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
> If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
> * faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
> * more efficient memory use as the dictionary array would use less memory when loaded in memory
> * faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
> [~nevime] , [~alamb] let me know what you think
--
This message was sent by Atlassian Jira
(v8.3.4#803005)