You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andrew Lamb (Jira)" <ji...@apache.org> on 2021/04/26 12:44:03 UTC
[jira] [Closed] (ARROW-11002) [Rust] Improve speed of JSON nested
list reader
[ https://issues.apache.org/jira/browse/ARROW-11002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Andrew Lamb closed ARROW-11002.
-------------------------------
Resolution: Invalid
> [Rust] Improve speed of JSON nested list reader
> -----------------------------------------------
>
> Key: ARROW-11002
> URL: https://issues.apache.org/jira/browse/ARROW-11002
> Project: Apache Arrow
> Issue Type: Improvement
> Components: Rust
> Reporter: Andrew Lamb
> Priority: Minor
>
> The code that reads in nested lists in rust/arrow/src/json/reader.rs does an extra copy (via `Vec::clone`) that caused 20% slowdown in a benchmark compared to not cloning.
> The goal of this ticket would be to improve the performance of reading JSON in this case, likely by avoiding the clone
> More details can be found here:
> https://github.com/apache/arrow/pull/8938#pullrequestreview-556273641
> As [~nevi_me] says:
> {quote}
> I suspect the main perf loss is from having to peek into JSON values in order to make the nesting work.
> By this, I mean that if we have {"a": [_, _, _]}, we extract a values into a Vec<Value>, i.e. [_, _, _].
> By extracting values, we are able to then use the reader to read &[Value] without caring about its key (a).
> The downside of this approach is that we have to clone values to get Vec<Value>, as I couldn't find an alternative
> {quote}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)