You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Michael Park (JIRA)" <ji...@apache.org> on 2017/10/09 17:56:00 UTC

[jira] [Commented] (SPARK-20589) Allow limiting task concurrency per stage

    [ https://issues.apache.org/jira/browse/SPARK-20589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16197402#comment-16197402 ] 

Michael Park commented on SPARK-20589:
--------------------------------------

Pardon my ignorance of the inner workings of task scheduling, but is it not possible provide a way to mark the max-concurrency of a specific RDD? The max concurrency of a stage would the be the minimum max-concurrency of all RDDs within that stage.

Also, +1 for not being an obscure use-case. We are seeing a need for this any time we attempt include an external service as part of a generic pipeline. Ideally the bottleneck can be limited to a single stage, rather then an entire job. 

> Allow limiting task concurrency per stage
> -----------------------------------------
>
>                 Key: SPARK-20589
>                 URL: https://issues.apache.org/jira/browse/SPARK-20589
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler
>    Affects Versions: 2.1.0
>            Reporter: Thomas Graves
>
> It would be nice to have the ability to limit the number of concurrent tasks per stage.  This is useful when your spark job might be accessing another service and you don't want to DOS that service.  For instance Spark writing to hbase or Spark doing http puts on a service.  Many times you want to do this without limiting the number of partitions. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org