You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Iaroslav Zeigerman (JIRA)" <ji...@apache.org> on 2016/08/18 16:57:20 UTC

[jira] [Commented] (SPARK-17131) Code generation fails when running SQL expressions against a wide dataset (thousands of columns)

    [ https://issues.apache.org/jira/browse/SPARK-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15426801#comment-15426801 ] 

Iaroslav Zeigerman commented on SPARK-17131:
--------------------------------------------

Having a different exception when trying to apply mean function to all columns:
{code}
val allCols = df.columns.map(c => mean(c))
val newDf = df.select(allCols: _*)
newDf.show()
{code}

{noformat}
java.io.EOFException
	at java.io.DataInputStream.readFully(DataInputStream.java:197)
	at java.io.DataInputStream.readFully(DataInputStream.java:169)
	at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383)
	at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555)
	at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518)
	at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:185)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912)
	at scala.collection.Iterator$class.foreach(Iterator.scala:742)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
	at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
	at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
...
{noformat}

> Code generation fails when running SQL expressions against a wide dataset (thousands of columns)
> ------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17131
>                 URL: https://issues.apache.org/jira/browse/SPARK-17131
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Iaroslav Zeigerman
>
> When reading the CSV file that contains 1776 columns Spark and Janino fail to generate the code with message:
> {noformat}
> Constant pool has grown past JVM limit of 0xFFFF
> {noformat}
> When running a common select with all columns it's fine:
> {code}
>       val allCols = df.columns.map(c => col(c).as(c + "_alias"))
>       val newDf = df.select(allCols: _*)
>       newDf.show()
> {code}
> But when I invoke the describe method:
> {code}
> newDf.describe(allCols: _*)
> {code}
> it fails with the following stack trace:
> {noformat}
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
> 	at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
> 	at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> 	at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> 	... 30 more
> Caused by: org.codehaus.janino.JaninoRuntimeException: Constant pool has grown past JVM limit of 0xFFFF
> 	at org.codehaus.janino.util.ClassFile.addToConstantPool(ClassFile.java:402)
> 	at org.codehaus.janino.util.ClassFile.addConstantIntegerInfo(ClassFile.java:300)
> 	at org.codehaus.janino.UnitCompiler.addConstantIntegerInfo(UnitCompiler.java:10307)
> 	at org.codehaus.janino.UnitCompiler.pushConstant(UnitCompiler.java:8868)
> 	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:4346)
> 	at org.codehaus.janino.UnitCompiler.access$7100(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$10.visitIntegerLiteral(UnitCompiler.java:3265)
> 	at org.codehaus.janino.Java$IntegerLiteral.accept(Java.java:4321)
> 	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> 	at org.codehaus.janino.UnitCompiler.fakeCompile(UnitCompiler.java:2605)
> 	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4362)
> 	at org.codehaus.janino.UnitCompiler.compileGet2(UnitCompiler.java:3975)
> 	at org.codehaus.janino.UnitCompiler.access$6900(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$10.visitMethodInvocation(UnitCompiler.java:3263)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> 	at org.codehaus.janino.UnitCompiler.compileGet(UnitCompiler.java:3290)
> 	at org.codehaus.janino.UnitCompiler.compileGetValue(UnitCompiler.java:4368)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:2662)
> 	at org.codehaus.janino.UnitCompiler.access$4400(UnitCompiler.java:185)
> 	at org.codehaus.janino.UnitCompiler$7.visitMethodInvocation(UnitCompiler.java:2627)
> 	at org.codehaus.janino.Java$MethodInvocation.accept(Java.java:3974)
> 	at org.codehaus.janino.UnitCompiler.compile(UnitCompiler.java:2654)
> 	at org.codehaus.janino.UnitCompiler.compile2(UnitCompiler.java:1643)
> ....
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org