You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:44:03 UTC

[jira] [Closed] (ARROW-11002) [Rust] Improve speed of JSON nested list reader

     [ https://issues.apache.org/jira/browse/ARROW-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Lamb closed ARROW-11002.
-------------------------------
    Resolution: Invalid

> [Rust] Improve speed of JSON nested list reader
> -----------------------------------------------
>
>                 Key: ARROW-11002
>                 URL: https://issues.apache.org/jira/browse/ARROW-11002
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Rust
>            Reporter: Andrew Lamb
>            Priority: Minor
>
> The code that reads in nested lists in rust/arrow/src/json/reader.rs does an extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared to not cloning.
> The goal of this ticket would be to improve the performance of reading JSON in this case, likely by avoiding the clone
> More details can be found here: 
> https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
> As [~nevi_me] says:
> {quote}
>  I suspect the main perf loss is from having to peek into JSON values in order to make the nesting work.
> By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a Vec<Value>, i.e. [_, _, _].
> By extracting values, we are able to then use the reader to read &[Value] without caring about its key (a).
> The downside of this approach is that we have to clone values to get Vec<Value>, as I couldn't find an alternative
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)