You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/04/02 05:14:00 UTC
[jira] [Resolved] (SPARK-26224) Results in stackOverFlowError when
trying to add 3000 new columns using withColumn function of dataframe.
[ https://issues.apache.org/jira/browse/SPARK-26224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-26224.
----------------------------------
Resolution: Fixed
Fix Version/s: 3.0.0
Issue resolved by pull request 23285
[https://github.com/apache/spark/pull/23285]
> Results in stackOverFlowError when trying to add 3000 new columns using withColumn function of dataframe.
> ---------------------------------------------------------------------------------------------------------
>
> Key: SPARK-26224
> URL: https://issues.apache.org/jira/browse/SPARK-26224
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.0
> Environment: On macbook, used Intellij editor. Ran the above sample code as unit test.
> Reporter: Dorjee Tsering
> Assignee: Marco Gaido
> Priority: Minor
> Fix For: 3.0.0
>
>
> Reproduction step:
> Run this sample code on your laptop. I am trying to add 3000 new columns to a base dataframe with 1 column.
>
>
> {code:java}
> import spark.implicits._
> val newColumnsToBeAdded : Seq[StructField] = for (i <- 1 to 3000) yield new StructField("field_" + i, DataTypes.LongType)
> val baseDataFrame: DataFrame = Seq((1)).toDF("employee_id")
> val result = newColumnsToBeAdded.foldLeft(baseDataFrame)((df, newColumn) => df.withColumn(newColumn.name, lit(0)))
> result.show(false)
>
> {code}
> Ends up with following stacktrace:
> java.lang.StackOverflowError
> at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:57)
> at scala.collection.generic.GenTraversableFactory$GenericCanBuildFrom.apply(GenTraversableFactory.scala:52)
> at scala.collection.TraversableLike$class.builder$1(TraversableLike.scala:229)
> at scala.collection.TraversableLike$class.map(TraversableLike.scala:233)
> at scala.collection.immutable.List.map(List.scala:296)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:333)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:272)
> at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:306)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
> at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:272)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org