You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2014/09/16 22:45:34 UTC
[jira] [Created] (SPARK-3554) handle large dataset in closure of
PySpark
Davies Liu created SPARK-3554:
---------------------------------
Summary: handle large dataset in closure of PySpark
Key: SPARK-3554
URL: https://issues.apache.org/jira/browse/SPARK-3554
Project: Spark
Issue Type: Improvement
Components: PySpark
Reporter: Davies Liu
Sometimes there are large dataset used in closure and user forget to use broadcast for it, then the serialized command will become huge.
py4j can not handle large objects efficiently, we should compress the serialized command and user broadcast for it if it's huge.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org