You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Santokh Singh (Jira)" <ji...@apache.org> on 2022/07/09 19:46:00 UTC

[jira] [Comment Edited] (SPARK-24815) Structured Streaming should support dynamic allocation

    [ https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564614#comment-17564614 ] 

Santokh Singh edited comment on SPARK-24815 at 7/9/22 7:45 PM:
---------------------------------------------------------------

Pretty much interested in this feature. With {{mapGroupsWithState}} api in structured streaming, or generic state management and sharing state across executors, would externalizing state help? I am aware of rocksDB being one way.


was (Author: JIRAUSER292561):
Pretty much interested in this feature. With {{mapGroupsWithState}} api in structured streaming, or generic state management and sharing state across executors, would externalizing state help? I am aware rocksDB being one way.

> Structured Streaming should support dynamic allocation
> ------------------------------------------------------
>
>                 Key: SPARK-24815
>                 URL: https://issues.apache.org/jira/browse/SPARK-24815
>             Project: Spark
>          Issue Type: Improvement
>          Components: Scheduler, Spark Core, Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Karthik Palaniappan
>            Priority: Minor
>
> For batch jobs, dynamic allocation is very useful for adding and removing containers to match the actual workload. On multi-tenant clusters, it ensures that a Spark job is taking no more resources than necessary. In cloud environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured streaming job, the batch dynamic allocation algorithm kicks in. It requests more executors if the task backlog is a certain size, and removes executors if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a particular implementation in SparkContext.scala (this should be a separate JIRA).
> 2) We should make a structured streaming algorithm that's separate from the batch algorithm. Eventually, continuous processing might need its own algorithm.
> 3) Spark should print a warning if you run a structured streaming job when Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org