You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Idan Zalzberg (JIRA)" <ji...@apache.org> on 2015/01/19 11:04:34 UTC

[jira] [Created] (SPARK-5319) Choosing partition size instead of count

Idan Zalzberg created SPARK-5319:
------------------------------------

             Summary: Choosing partition size instead of count
                 Key: SPARK-5319
                 URL: https://issues.apache.org/jira/browse/SPARK-5319
             Project: Spark
          Issue Type: Brainstorming
            Reporter: Idan Zalzberg


With the current API, there are multiple locations when you can set the partition count when reading from sources.

However IME, it is sometimes useful to set the partition size (in MB), and infer the count from that. 
IME, spark is sensitive to the partition size, if they are too big, it raises the amount of memory needed per core, and if they are too small then the stage times increase significantly, so I'd like to stay in the "sweet spot" of the partition size, without trying to change the partition count around until I find it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org