You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Hrishikesh Mishra <sd...@gmail.com> on 2020/03/04 06:46:45 UTC
Past batch time in Spark Streaming
Hi
When my Spark stream job starts, it doesn't able to process all the data in
batch duration then back-pressure kicks in which reduces the batch size.
[image: image.png]
This is not a problem. Event after couple hours when it processes all the
data from Kafka stream (and suppose no data is further getting produced to
Kafka), still batch time is showing past time.
[image: image.png]
[image: image.png]
But if any data comes, then it able to process then and there. For example,
I published one event in Kafka around 3PM but it processed in batch of *Batch
Time : 2020/02/20 13:37:30*
My question is what the "Batch Time" in Spark UI. And why its showing past
time when it has current produced event. And how its different from *Submited
Time*
Spark config
"spark.shuffle.service.enabled", "true"
"spark.streaming.receiver.maxRate", "10000"
"spark.streaming.kafka.maxRatePerPartition", "600"
"spark.streaming.backpressure.enabled", "true"
"spark.streaming.backpressure.initialRate", "10000"
"spark.streaming.blockInterval", "100ms"
"spark.executor.extraJavaOptions", "-XX:+UseConcMarkSweepGC"
Regards
Hrishi