You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Peter Liu <pe...@gmail.com> on 2018/08/03 20:10:12 UTC

re: should we dump a warning if we drop batches due to window move?

Hello there,

I have a quick question for the following case:

situation:
a spark consumer is able to process 5 batches in 10 sec (where the batch
interval is zero by default - correct me if this is wrong). the window size
is 10 sec (zero overlapping sliding).
there are some fluctuations in the incoming message arriving rate,
resulting a slightly higher incoming message rate than the consumer is able
to handle, say,  sometimes 6 batch worth of data comes in 10 sec for 5
minutes ...

question:
would we (spark 2.2) drop the 6th. batch when the 10 sec window moves on -
Or unprocessed batches keeps accumulate?
if we drop, would we dump a warning in the log?

i can see warning (see attached below) when the batch processing takes more
time than an explicitly set batch interval (which is not the case here). i
would expect a similar warning in the log (can't find this type of warning)
when we have to drop batches in the above case.

maybe i was just looking for wrong text in the log? in general, is the
expectation reasonable? (I can't find anything here:
https://spark.apache.org/docs/2.2.1/streaming-programming-guide.html and
from general googling)

any comments/suggestions would be very much appreciated!

Thanks,

Peter

18/08/03 00:33:43 WARN streaming.ProcessingTimeExecutor: Current batch is
falling behind. The trigger interval is 10000 milliseconds, but spent 11965
milliseconds
18/08/03 00:33:55 WARN streaming.ProcessingTimeExecutor: Current batch is
falling behind. The trigger interval is 10000 milliseconds, but spent 11266
milliseconds