You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Marcelo Vanzin (JIRA)" <ji...@apache.org> on 2019/03/13 17:00:00 UTC

[jira] [Assigned] (SPARK-26103) OutOfMemory error with large query plans

     [ https://issues.apache.org/jira/browse/SPARK-26103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Marcelo Vanzin reassigned SPARK-26103:
--------------------------------------

    Assignee: Dave DeCaprio

> OutOfMemory error with large query plans
> ----------------------------------------
>
>                 Key: SPARK-26103
>                 URL: https://issues.apache.org/jira/browse/SPARK-26103
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2
>         Environment: Amazon EMR 5.19
> 1 c5.4xlarge master instance
> 1 c5.4xlarge core instance
> 2 c5.4xlarge task instances
>            Reporter: Dave DeCaprio
>            Assignee: Dave DeCaprio
>            Priority: Major
>
> Large query plans can cause OutOfMemory errors in the Spark driver.
> We are creating data frames that are not extremely large but contain lots of nested joins.  These plans execute efficiently because of caching and partitioning, but the text version of the query plans generated can be hundreds of megabytes.  Running many of these in parallel causes our driver process to fail.
> {{{{Exception in thread "main" java.lang.OutOfMemoryError: Java heap space at java.util.Arrays.copyOfRange(Arrays.java:2694) at java.lang.String.<init>(String.java:203) at java.lang.StringBuilder.toString(StringBuilder.java:405) at scala.StringContext.standardInterpolator(StringContext.scala:125) at scala.StringContext.s(StringContext.scala:90) at org.apache.spark.sql.execution.QueryExecution.toString(QueryExecution.scala:70) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:52) at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelation.run(InsertIntoHadoopFsRelation.scala:108) }}}}
>  
> A similar error is reported in [https://stackoverflow.com/questions/38307258/out-of-memory-error-when-writing-out-spark-dataframes-to-parquet-format]
>  
> Code exists to truncate the string if the number of output columns is larger than 25, but not if the rest of the query plan is huge.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org