You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/08/24 19:34:20 UTC

[jira] [Resolved] (SPARK-17223) "grows beyond 64 KB" with data frame with many columns

     [ https://issues.apache.org/jira/browse/SPARK-17223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-17223.
-------------------------------
    Resolution: Duplicate

> "grows beyond 64 KB" with data frame with many columns
> ------------------------------------------------------
>
>                 Key: SPARK-17223
>                 URL: https://issues.apache.org/jira/browse/SPARK-17223
>             Project: Spark
>          Issue Type: Bug
>          Components: ML, PySpark
>    Affects Versions: 2.0.0, 2.1.0
>            Reporter: K
>
> Hi everyone, 
> We have a dataset with ~500 column. If I called a LabelIndexer on it and tried to print out the first line, it fails with "grows beyond 64KB" error below. My original dataset had >20K rows, I stripped to 100 rows, but didn't help. Eventually, we want to feed LabelIndexer, VectorAssembler and Random Forest into Pipeline but  we are not having much luck here :( We tried with 2.0.0, and 2.1.0(snapshot as of 8/23). The problem is reproducible with the data file here: 
> https://drive.google.com/file/d/0B2zl8xCBUVh6TFZDd3ZSUTNsam8/view?usp=sharing
> Thanks a lot!!
> Environment: Cluster with 2 nodes (CentOS, 64GB RAM and 8 cores each)
> Code is here (JIRA corrupted it so moved to google doc)
> https://docs.google.com/document/d/19unfhSMMCjoXqhmFOA1omm4V2wHaraY0RxZesbQluZU/edit?usp=sharing
> ERROR:
> Py4JJavaError: An error occurred while calling z:org.apache.spark.sql.execution.python.EvaluatePython.takeAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 250.0 failed 4 times, most recent failure: Lost task 0.3 in stage 250.0 (TID 4666, ip): java.util.concurrent.ExecutionException: java.lang.Exception: failed to compile: org.codehaus.janino.JaninoRuntimeException: Code of method "compare(Lorg/apache/spark/sql/catalyst/InternalRow;Lorg/apache/spark/sql/catalyst/InternalRow;)I" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org