You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by "Shlomi.b" <sh...@gigya-inc.com> on 2016/11/09 08:48:55 UTC

Spark streaming delays spikes

Hi,
We are using spark streaming version 1.6.2 and came across a weird behavior.
Our system pulls log events data from flume servers, enrich the events and
save them to ES.
We are using window interval of 15 seconds and the rate on peak hours is
around 70K events.

The average time to process the data and index it to es for a window
interval, takes about 12 seconds, but we see that every 4-5 window intervals
we have a peak to 18-22 seconds.

Looking at the spark UI we see a strange behavior.
Most of the time it shows that every executor has indexed a few thousands
records to ES, and the size is around 5M, and when the peak interval
happens, we see that 2 jobs were created to index data to es, where the
second job took 6-9 seconds to index 1 record of 1800M~.

2 points I would like to clarify:
1.All of our original events are of size 3KB -5KB.
2.When changing the application to save the rdd as text file, (of course, it
took less time than es)
we see the same weird behavior and peak every 4-5 windows intervals.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-delays-spikes-tp28052.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org