You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2014/09/24 20:09:34 UTC

[jira] [Commented] (SPARK-3466) Limit size of results that a driver collects for each action

    [ https://issues.apache.org/jira/browse/SPARK-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14146640#comment-14146640 ] 

Andrew Ash commented on SPARK-3466:
-----------------------------------

How would you design this feature?

I can imagine measuring the size of partitions / RDD elements while they are held in memory across the cluster, sending those sizes back to the driver, and having the driver throw an exception if the requested size exceeds the threshold.  Otherwise proceed as normal.

Is that how you were envisioning implementation?

> Limit size of results that a driver collects for each action
> ------------------------------------------------------------
>
>                 Key: SPARK-3466
>                 URL: https://issues.apache.org/jira/browse/SPARK-3466
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>            Reporter: Matei Zaharia
>
> Right now, operations like collect() and take() can crash the driver if they bring back too many data. We should add a spark.driver.maxResultSize setting (or something like that) that will make the driver abort a job if its result is too big. We can set it to some fraction of the driver's memory by default, or to something like 100 MB.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org