You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2016/07/28 20:04:20 UTC

[jira] [Resolved] (SPARK-16764) Recommend disabling vectorized parquet reader on OutOfMemoryError

     [ https://issues.apache.org/jira/browse/SPARK-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-16764.
---------------------------------
       Resolution: Fixed
         Assignee: Sameer Agarwal
    Fix Version/s: 2.1.0
                   2.0.1

> Recommend disabling vectorized parquet reader on OutOfMemoryError
> -----------------------------------------------------------------
>
>                 Key: SPARK-16764
>                 URL: https://issues.apache.org/jira/browse/SPARK-16764
>             Project: Spark
>          Issue Type: Improvement
>            Reporter: Sameer Agarwal
>            Assignee: Sameer Agarwal
>             Fix For: 2.0.1, 2.1.0
>
>
> We currently don't bound or manage the data array size used by column vectors in the vectorized reader (they're just bound by INT.MAX) which may lead to OOMs while reading data. In the short term, we can probably intercept this exception and suggest the user to disable the vectorized parquet reader. 
> Longer term, we should probably do explicit memory management for this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org