You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Wenchen Fan (Jira)" <ji...@apache.org> on 2020/01/07 09:19:00 UTC

[jira] [Resolved] (SPARK-30338) Avoid unnecessary InternalRow copies in ParquetRowConverter

     [ https://issues.apache.org/jira/browse/SPARK-30338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-30338.
---------------------------------
    Fix Version/s: 3.0.0
       Resolution: Fixed

Issue resolved by pull request 26993
[https://github.com/apache/spark/pull/26993]

> Avoid unnecessary InternalRow copies in ParquetRowConverter
> -----------------------------------------------------------
>
>                 Key: SPARK-30338
>                 URL: https://issues.apache.org/jira/browse/SPARK-30338
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Major
>             Fix For: 3.0.0
>
>
> ParquetRowConverter calls {{InternalRow.copy()}} in cases where the copy is unnecessary; this can severely harm performance when reading deeply-nested Parquet.
> It looks like this copying was originally added to handle arrays and maps of structs (in which case we need to keep the copying), but we can omit it for the more common case of structs nested directly in structs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org