You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by "Hussain, Saghir" <sa...@akamai.com.INVALID> on 2022/04/13 07:05:46 UTC

[Spark Streaming]: Why planInputPartitions is called multiple times for each micro-batch in Spark 3?

Hi All

While upgrading our custom streaming data source from Spark 2.4.5 to Spark 3.2.1, we observed that the planInputPartitions() method in MicroBatchStream is being called multiple times(4 in our case) for each micro-batch in Spark 3.

The Apache Spark documentation also says that :
The method planInputPartitions will be called multiple times, to launch one Spark job for each micro-batch in this data stream.<https://spark.apache.org/docs/3.0.0-preview2/api/java/org/apache/spark/sql/connector/read/streaming/MicroBatchStream.html#:~:text=recent%20offset%20available.-,planInputPartitions,Spark%20job%20for%20each%20micro%2Dbatch%20in%20this%20data%20stream.,-createReaderFactory>

What is the reason for this?

Thanks & Regards,
Saghir Hussain