You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:47:02 UTC

[jira] [Commented] (ARROW-11410) [Rust][Parquet] Implement returning dictionary arrays from parquet reader

    [ https://issues.apache.org/jira/browse/ARROW-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332201#comment-17332201 ] 

Andrew Lamb commented on ARROW-11410:
-------------------------------------

Migrated to github: https://github.com/apache/arrow-rs/issues/171

> [Rust][Parquet] Implement returning dictionary arrays from parquet reader
> -------------------------------------------------------------------------
>
>                 Key: ARROW-11410
>                 URL: https://issues.apache.org/jira/browse/ARROW-11410
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Yordan Pavlov
>            Priority: Major
>
> Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
> If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
>  * faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
>  * more efficient memory use as the dictionary array would use less memory when loaded in memory
>  * faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
> [~nevime] , [~alamb]  let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)