You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Hrishikesh Mishra <sd...@gmail.com> on 2020/03/04 06:46:45 UTC

Past batch time in Spark Streaming

Hi

When my Spark stream job starts, it doesn't able to process all the data in
batch duration then back-pressure kicks in which reduces the batch size.

[image: image.png]

This is not a problem. Event after couple hours when it processes all the
data from Kafka stream (and suppose no data is further getting produced to
Kafka), still batch time is showing past time.
[image: image.png]
[image: image.png]


But if any data comes, then it able to process then and there. For example,
I published one event in Kafka around 3PM but it processed in batch of *Batch
Time : 2020/02/20 13:37:30*

My question is what the "Batch Time" in Spark UI. And why its showing past
time when it has current produced event. And how its different from *Submited
Time*

Spark config

"spark.shuffle.service.enabled", "true"
"spark.streaming.receiver.maxRate", "10000"
"spark.streaming.kafka.maxRatePerPartition", "600"
"spark.streaming.backpressure.enabled", "true"
"spark.streaming.backpressure.initialRate", "10000"
"spark.streaming.blockInterval", "100ms"
"spark.executor.extraJavaOptions", "-XX:+UseConcMarkSweepGC"



Regards

Hrishi