You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Gerard Alexander (Jira)" <ji...@apache.org> on 2020/07/19 20:33:00 UTC

[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

    [ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17160816#comment-17160816 ] 

Gerard Alexander commented on SPARK-24815:
------------------------------------------

When I read this, I am not sure what is being said. This [https://dzone.com/articles/spark-dynamic-allocation]  states and shows that Spark Structured Streaming does this capability of relinquishing or getting new resources up till maximum setting that has been set with the job. Can you clarify please? The issue seems to be that the batch approach is applied to a non-batch situation.

> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).
> 2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.
> 3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org