You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Bryan Jeffrey <br...@gmail.com> on 2016/05/09 14:32:10 UTC

Streaming application slows over time

All,

I am seeing an odd issue with my streaming application.  I am running Spark
1.4.1, Scala 2.10.  Our streaming application has a batch time of two
minutes.  The application runs well for a reasonable period of time (4-8
hours).  It processes the same data in approximately the same amount of
time.  Because we're consuming large amounts of data, I tweaked some GC
settings to collect more frequently and enabled G1GC.  This effectively
corrected any memory pressure, so memory is looking good.

After some number of batches I am seeing the time to process a batch
increase by a large margin (from 40 seconds / batch to 130 seconds /
batch).  Because our batch time is 2 minutes this has the effect of
(eventually) running behind. This in turn causes memory issues and
eventually leads to OOM.  However, OOM seems to be a symptom of the
slowdown not a cause.  I have looked in driver/executor logs, but do not
see any reason that we're seeing slowness.  The data volume is the same,
none of the executors seem to be crashing, nothing is going to disk, memory
is well within bounds, and nothing else is running on these boxes.

Are there any suggested debugging techniques, things to look for or known
bugs in similar instances?

Regards,

Bryan Jeffrey

Re: Streaming application slows over time

Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi Bryan,

when this happens do you check OS to see the amount of memory free and cpu
usage. sounds like another application may be creeping in.

OS tools like free and top may provide some clue.

Also you may see a number of skips in Spark GUI that could not be processed.

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 9 May 2016 at 15:32, Bryan Jeffrey <br...@gmail.com> wrote:

> All,
>
> I am seeing an odd issue with my streaming application.  I am running
> Spark 1.4.1, Scala 2.10.  Our streaming application has a batch time of two
> minutes.  The application runs well for a reasonable period of time (4-8
> hours).  It processes the same data in approximately the same amount of
> time.  Because we're consuming large amounts of data, I tweaked some GC
> settings to collect more frequently and enabled G1GC.  This effectively
> corrected any memory pressure, so memory is looking good.
>
> After some number of batches I am seeing the time to process a batch
> increase by a large margin (from 40 seconds / batch to 130 seconds /
> batch).  Because our batch time is 2 minutes this has the effect of
> (eventually) running behind. This in turn causes memory issues and
> eventually leads to OOM.  However, OOM seems to be a symptom of the
> slowdown not a cause.  I have looked in driver/executor logs, but do not
> see any reason that we're seeing slowness.  The data volume is the same,
> none of the executors seem to be crashing, nothing is going to disk, memory
> is well within bounds, and nothing else is running on these boxes.
>
> Are there any suggested debugging techniques, things to look for or known
> bugs in similar instances?
>
> Regards,
>
> Bryan Jeffrey
>