You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ivan Tsukanov (JIRA)" <ji...@apache.org> on 2018/11/09 05:31:00 UTC
[jira] [Created] (SPARK-25987) StackOverflowError when executing
many operations on a table with many columns
Ivan Tsukanov created SPARK-25987:
-------------------------------------
Summary: StackOverflowError when executing many operations on a table with many columns
Key: SPARK-25987
URL: https://issues.apache.org/jira/browse/SPARK-25987
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 2.3.2, 2.3.0, 2.2.2, 2.2.1
Environment: Ubuntu 18.04.1 LTS, openjdk "1.8.0_181"
Reporter: Ivan Tsukanov
When I execute
{code:java}
val columnsCount = 100
val columns = (1 to columnsCount).map(i => s"col$i")
val initialData = (1 to columnsCount).map(i => s"val$i")
val df = sparkSession.createDataFrame(
rowRDD = sparkSession.sparkContext.makeRDD(Seq(Row.fromSeq(initialData))),
schema = StructType(columns.map(StructField(_, StringType, true)))
)
val addSuffixUDF = udf(
(str: String) => str + "_added"
)
implicit class DFOps(df: DataFrame) {
def addSuffix() = {
df.select(columns.map(col =>
addSuffixUDF(df(col)).as(col)
): _*)
}
}
df
.addSuffix()
.addSuffix()
.addSuffix()
.show()
{code}
I get
{code:java}
An exception or error caused a run to abort.
java.lang.StackOverflowError
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:385)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:553)
...
{code}
If I reduce columns number (to 10 for example) or do `addSuffix` only once - it works fine.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org