You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2020/12/21 18:25:00 UTC

[jira] [Created] (ARROW-11002) [Rust] Improve speed of JSON nested list reader

Andrew Lamb created ARROW-11002:
-----------------------------------

             Summary: [Rust] Improve speed of JSON nested list reader
                 Key: ARROW-11002
                 URL: https://issues.apache.org/jira/browse/ARROW-11002
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust
            Reporter: Andrew Lamb


The code that reads in nested lists in rust/arrow/src/json/reader.rs does an extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared to not cloning.

The goal of this ticket would be to improve the performance of reading JSON in this case, likely by avoiding the clone

More details can be found here: 

https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641

As [~nevi_me] says:
> I suspect the main perf loss is from having to peek into JSON values in order to make the nesting work.
> By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a Vec<Value>, i.e. [_, _, _].
> By extracting values, we are able to then use the reader to read &[Value] without caring about its key (a).
> The downside of this approach is that we have to clone values to get Vec<Value>, as I couldn't find an alternative



--
This message was sent by Atlassian Jira
(v8.3.4#803005)