You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Mayur Mohite <ma...@applift.com> on 2016/04/01 09:42:39 UTC

Multiple streaming jobs on the same topic

Hi,

We have a kafka cluster running in production and there are two spark
streaming job (J1 and J2) that fetches the data from the same topic.

We noticed that if one of the two jobs (say J1) starts reading data from
old offset (that job failed for 2 hours and when we started the job after
fixing the failure the offset was old), that data is read from disk instead
of reading from OS cache.

When this happens the other job's (J2) throughput is reduced even though
that job's offset is recent.
We believe that the recent data is most likely in memory so we are not sure
why the other job's (J2) throughput is reduced.

Did anyone come across such an issue in production? If yes how did you fix
the issue?

-Mayur

-- 


Learn more about our inaugural *FirstScreen Conference 
<http://www.firstscreenconf.com/>*!
*Where the worlds of mobile advertising and technology meet!*

June 15, 2016 @ Urania Berlin

Re: Multiple streaming jobs on the same topic

Posted by R Krishna <kr...@gmail.com>.

Then, can you specify a size/percentage of cache per consumer group?
On Apr 1, 2016 9:09 AM, "Cees de Groot" <ce...@pagerduty.com> wrote:

> One of Kafka's design ideas is to keep data in the JVM to a minimum,
> offloading caching to the OS. So on the Kafka level, there's pretty much
> not much you can do - the old data is buffered by the system (has to be to
> be read into userspace) and thus this reduces the amount of cache available
> to the other job.
>
> Buy more memory ;-)
>
> (also, I think it's smart to tune _down_ the amount of memory you give to
> the Kafka JVM, to maximize the OS's buffering. You don't want large amounts
> of JVM memory filled with garbage contending with OS buffer cache filled
> with useful data).
>
> On Fri, Apr 1, 2016 at 3:42 AM, Mayur Mohite <ma...@applift.com>
> wrote:
>
> > Hi,
> >
> > We have a kafka cluster running in production and there are two spark
> > streaming job (J1 and J2) that fetches the data from the same topic.
> >
> > We noticed that if one of the two jobs (say J1) starts reading data from
> > old offset (that job failed for 2 hours and when we started the job after
> > fixing the failure the offset was old), that data is read from disk
> instead
> > of reading from OS cache.
> >
> > When this happens the other job's (J2) throughput is reduced even though
> > that job's offset is recent.
> > We believe that the recent data is most likely in memory so we are not
> sure
> > why the other job's (J2) throughput is reduced.
> >
> > Did anyone come across such an issue in production? If yes how did you
> fix
> > the issue?
> >
> > -Mayur
> >
> > --
> >
> >
> > Learn more about our inaugural *FirstScreen Conference
> > <http://www.firstscreenconf.com/>*!
> > *Where the worlds of mobile advertising and technology meet!*
> >
> > June 15, 2016 @ Urania Berlin
> >
>
>
>
> --
> Cees de Groot
> Principal Software Engineer
> PagerDuty, Inc.
>

Re: Multiple streaming jobs on the same topic

Posted by Cees de Groot <ce...@pagerduty.com>.

One of Kafka's design ideas is to keep data in the JVM to a minimum,
offloading caching to the OS. So on the Kafka level, there's pretty much
not much you can do - the old data is buffered by the system (has to be to
be read into userspace) and thus this reduces the amount of cache available
to the other job.

Buy more memory ;-)

(also, I think it's smart to tune _down_ the amount of memory you give to
the Kafka JVM, to maximize the OS's buffering. You don't want large amounts
of JVM memory filled with garbage contending with OS buffer cache filled
with useful data).

On Fri, Apr 1, 2016 at 3:42 AM, Mayur Mohite <ma...@applift.com>
wrote:

> Hi,
>
> We have a kafka cluster running in production and there are two spark
> streaming job (J1 and J2) that fetches the data from the same topic.
>
> We noticed that if one of the two jobs (say J1) starts reading data from
> old offset (that job failed for 2 hours and when we started the job after
> fixing the failure the offset was old), that data is read from disk instead
> of reading from OS cache.
>
> When this happens the other job's (J2) throughput is reduced even though
> that job's offset is recent.
> We believe that the recent data is most likely in memory so we are not sure
> why the other job's (J2) throughput is reduced.
>
> Did anyone come across such an issue in production? If yes how did you fix
> the issue?
>
> -Mayur
>
> --
>
>
> Learn more about our inaugural *FirstScreen Conference
> <http://www.firstscreenconf.com/>*!
> *Where the worlds of mobile advertising and technology meet!*
>
> June 15, 2016 @ Urania Berlin
>

-- 
Cees de Groot
Principal Software Engineer
PagerDuty, Inc.