You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/02/02 11:51:00 UTC

[jira] [Commented] (ARROW-11410) [Rust][Parquet] Implement returning dictionary arrays from parquet reader

    [ https://issues.apache.org/jira/browse/ARROW-11410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277063#comment-17277063 ] 

Andrew Lamb commented on ARROW-11410:
-------------------------------------

[~yordan-pavlov] I think this would be amazing -- and we would definitely use it in IOx. This is the kind of thing that is on our longer term roadmap and I would love to help (e.g. code review, or testing , or documentation, etc).

Let me know! 

> [Rust][Parquet] Implement returning dictionary arrays from parquet reader
> -------------------------------------------------------------------------
>
>                 Key: ARROW-11410
>                 URL: https://issues.apache.org/jira/browse/ARROW-11410
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Yordan Pavlov
>            Priority: Major
>
> Currently the Rust parquet reader returns a regular array (e.g. string array) even when the column is dictionary encoded in the parquet file.
> If the parquet reader had the ability to return dictionary arrays for dictionary encoded columns this would bring many benefits such as:
>  * faster reading of dictionary encoded columns from parquet (as no conversion/expansion into a regular array would be necessary)
>  * more efficient memory use as the dictionary array would use less memory when loaded in memory
>  * faster filtering operations as SIMD can be used to filter over the numeric keys of a dictionary string array instead of comparing string values in a string array
> [~nevime] , [~alamb]  let me know what you think



--
This message was sent by Atlassian Jira
(v8.3.4#803005)