You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Zhenyu Hu <cs...@gmail.com> on 2021/08/12 02:33:29 UTC

Spark DStream Dynamic Allocation

1. First of all, I would like to ask whether the dynamic scaling of Spark
DStream is available now? It is not mentioned in the Spark documentation
2. Spark DStream dynamic scaling will randomly kill a non-receiver executor
when the average processing delay divided by the batch processing interval
is less than 0.5. But this may cause the executor to lose the cache or
shuffle data, how to deal with this situation
3. If WindowedDStream exists, the job batch will be triggered according to
slidingDuration of WindowedDStream, but the dynamic scaling of DStream is
still based on the processing delay of each job batch divided by
BatchDuration. Is this reasonable? I think the ratio should be calculated
by dividing the job processing delay by SlidingDuration