You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2014/09/16 22:45:34 UTC

[jira] [Created] (SPARK-3554) handle large dataset in closure of PySpark

Davies Liu created SPARK-3554:
---------------------------------

             Summary: handle large dataset in closure of PySpark
                 Key: SPARK-3554
                 URL: https://issues.apache.org/jira/browse/SPARK-3554
             Project: Spark
          Issue Type: Improvement
          Components: PySpark
            Reporter: Davies Liu


Sometimes there are large dataset used in closure and user forget to use broadcast for it, then the serialized command will become huge.

py4j can not handle large objects efficiently, we should compress the serialized command and user broadcast for it if it's huge.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org