You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Xiao Li (JIRA)" <ji...@apache.org> on 2018/02/07 01:43:00 UTC
[jira] [Commented] (SPARK-17217) Codegeneration fails for
describe() on many columns
[ https://issues.apache.org/jira/browse/SPARK-17217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16354848#comment-16354848 ]
Xiao Li commented on SPARK-17217:
---------------------------------
It should be resolved by https://issues.apache.org/jira/browse/SPARK-22510. If not, please re-open it.
> Codegeneration fails for describe() on many columns
> ---------------------------------------------------
>
> Key: SPARK-17217
> URL: https://issues.apache.org/jira/browse/SPARK-17217
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.0.0
> Reporter: Kalle Jepsen
> Priority: Major
>
> Consider the following minimal python script:
> {code:python}
> import pyspark
> from pyspark.sql import functions as F
> conf = pyspark.SparkConf()
> sc = pyspark.SparkContext(conf=conf)
> spark = pyspark.sql.SQLContext(sc)
> ncols = 510
> nrows = 10
> df = spark.range(0, nrows)
> s = df.select(
> [
> F.randn(seed=i).alias('C%i' % i) for i in range(ncols)
> ]
> ).describe()
> {code}
> This fails with a traceback counting 3.6M (!) lines for {{ncols >= 510}}, saying something like
> {noformat}
> 16/08/24 16:50:57 ERROR CodeGenerator: failed to compile: java.io.EOFException
> /* 001 */ public java.lang.Object generate(Object[] references) {
> /* 002 */ return new SpecificMutableProjection(references);
> /* 003 */ }
> /* 004 */
> /* 005 */ class SpecificMutableProjection extends org.apache.spark.sql.catalyst.expressions.codegen.BaseMutableProjection {
> ...
> /* 7372 */ private boolean isNull_1969;
> /* 7373 */ private double value_1969;
> /* 7374 */ private boolean isNull_1970;
> ...
> /* 11035 */ double value14944 = -1.0;
> /* 11036 */
> /* 11037 */
> /* 11038 */ if (!evalExpr1052IsNull) {
> /* 11039 */
> /* 11040 */ isNull14944 = false; // resultCode could change nullability.
> /* 11041 */ value14944 = evalExpr1326Value - evalExpr1052Value;
> /* 11042 */
> ...
> /* 157621 */ apply1_6(i);
> /* 157622 */ return mutableRow;
> /* 157623 */ }
> /* 157624 */ }
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:889)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:941)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anon$1.load(CodeGenerator.scala:938)
> at org.spark_project.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
> at org.spark_project.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
> at org.spark_project.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
> ... 30 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readFully(DataInputStream.java:197)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at org.codehaus.janino.util.ClassFile.loadAttribute(ClassFile.java:1383)
> at org.codehaus.janino.util.ClassFile.loadAttributes(ClassFile.java:555)
> at org.codehaus.janino.util.ClassFile.loadFields(ClassFile.java:518)
> at org.codehaus.janino.util.ClassFile.<init>(ClassFile.java:185)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:914)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$$anonfun$recordCompilationStats$1.apply(CodeGenerator.scala:912)
> at scala.collection.Iterator$class.foreach(Iterator.scala:893)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.recordCompilationStats(CodeGenerator.scala:912)
> at org.apache.spark.sql.catalyst.expressions.codegen.CodeGenerator$.org$apache$spark$sql$catalyst$expressions$codegen$CodeGenerator$$doCompile(CodeGenerator.scala:884)
> ... 35 more
> {noformat}
> I've seen something similar in an earlier Spark version ([reported in this issue|https://issues.apache.org/jira/browse/SPARK-14138]).
> My conclusion is that {{describe}} was never meant to be used non-interactively on very wide dataframes, am I right?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org