You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Zamil Majdy (Jira)" <ji...@apache.org> on 2023/04/24 14:41:00 UTC
[jira] [Created] (SPARK-43264) Avoid allocation of unwritten ColumnVector in VectorizedReader
Zamil Majdy created SPARK-43264:
-----------------------------------
Summary: Avoid allocation of unwritten ColumnVector in VectorizedReader
Key: SPARK-43264
URL: https://issues.apache.org/jira/browse/SPARK-43264
Project: Spark
Issue Type: Improvement
Components: Spark Core, SQL
Affects Versions: 3.4.1, 3.5.0
Reporter: Zamil Majdy
Spark Vectorized Reader allocates the array for every fields for each value count even the array is ended up empty. This causes a high memory consumption when reading a table with large struct+array or many columns with sparse value. One way to fix this is by lazily allocating the column vector and only allocates the array only when it is needed (array is written).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org