You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Zamil Majdy (Jira)" <ji...@apache.org> on 2023/04/24 14:41:00 UTC

[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader

Zamil Majdy created SPARK-43264:
-----------------------------------

             Summary: Avoid allocation of unwritten ColumnVector in VectorizedReader
                 Key: SPARK-43264
                 URL: https://issues.apache.org/jira/browse/SPARK-43264
             Project: Spark
          Issue Type: Improvement
          Components: Spark Core, SQL
    Affects Versions: 3.4.1, 3.5.0
            Reporter: Zamil Majdy


Spark Vectorized Reader allocates the array for every fields for each value count even the array is ended up empty. This causes a high memory consumption when reading a table with large struct+array or many columns with sparse value. One way to fix this is by lazily allocating the column vector and only allocates the array only when it is needed (array is written).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org