You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "eugen yushin (JIRA)" <ji...@apache.org> on 2019/08/16 13:39:00 UTC
[jira] [Commented] (SPARK-28742) StackOverflowError when using
otherwise(col()) in a loop
[ https://issues.apache.org/jira/browse/SPARK-28742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16909052#comment-16909052 ]
eugen yushin commented on SPARK-28742:
--------------------------------------
Looks like the issue is diff in logic between LocalRelation (used for data frames) and LogicalRDD (used for DF created from RDD)
```
val df2 = Seq("1").toDF("c1")
df.explain(true)
df2.explain(true)
```
> StackOverflowError when using otherwise(col()) in a loop
> --------------------------------------------------------
>
> Key: SPARK-28742
> URL: https://issues.apache.org/jira/browse/SPARK-28742
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.4.0, 2.4.3
> Reporter: Ivan Tsukanov
> Priority: Major
>
> The following code
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
> StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> (1 to 9).foldLeft(df) { case (acc, _) =>
> val res = acc.withColumn("c1", column)
> res.take(1)
> res
> }
> {code}
> falls with
> {code:java}
> java.lang.StackOverflowError
> at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:395)
> ...{code}
> Probably, the problem is spark generates unexplainable big Physical Plan -
> {code:java}
> val rdd = sparkContext.makeRDD(Seq(Row("1")))
> val schema = StructType(Seq(
> StructField("c1", StringType)
> ))
> val df = sparkSession.createDataFrame(rdd, schema)
> val column = when(col("c1").isin("1"), "1").otherwise(col("c1"))
> val result = (1 to 9).foldLeft(df) { case (acc, _) =>
> acc.withColumn("c1", column)
> }
> result.explain()
> {code}
> it shows a plan 18936 symbols length
> {code:java}
> == Physical Plan ==
> *(1) Project [CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE WHEN (CASE .... 18936 symbols
> +- Scan ExistingRDD[c1#1] {code}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org