You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Vineet Mishra <cl...@gmail.com> on 2015/02/08 09:06:07 UTC

High Latency in Kafka

Hi All,

I am having some log files of around 30GB, I am trying to event process
these logs by pushing them to Kafka. I could clearly see the throughput
achieved while publishing these event to Kafka is quiet slow.

So as mentioned for the single log file of 30GB, the Logstash is
continuously emitting to Kafka and it is running from more than 2 days but
still it has processed just 60% of the log data. I was looking out for a
way to increase the efficiency of the publishing the event to kafka as with
this rate of data ingestion I don't think it will be a good option to move
ahead.

Looking out for performance improvisation for the same.

Experts advise required!

Thanks!

Re: High Latency in Kafka

Posted by Andrey Yegorov <an...@gmail.com>.
I am not familiar with logstash, but in custom log replay tool (used to
replay messages logged locally in case if e.g. kafka was not available and
useful in some other scenarios) I've seen it reaching 30,000 messages/sec
with avg message size of 4.5 kilobytes, all with regular production load on
kafka (6 brokers). At this rate sending 30G of logs should take about 4 min.

Tool has:
one thread to read messages and put into the queue.
5 (configurable) threads that read messages from the queue and send them to
kafka, with one producer per thread.
I am using new producer from kafka 0.8.2.-beta and async send.
I remember that I had to tune some parameters for kafka producer, increased
buffer sizes and something else.

HTH.



----------
Andrey Yegorov

On Tue, Feb 10, 2015 at 5:54 AM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi Gwen,
>
> Well I have gone through this link while trying to setup my Logstash Kafka
> handler,
>
> https://github.com/joekiller/logstash-kafka
>
> I could achieve what I was looking for but the performance is badly
> affected while trying to write a big file of GB's.
> I guess there should be some way so as to parallelise the existing running
> process.
>
> Thanks!
>
> On Sun, Feb 8, 2015 at 8:06 PM, Gwen Shapira <gs...@cloudera.com>
> wrote:
>
> > I'm wondering how much of the time is spent by Logstash reading and
> > processing the log vs. time spent sending data to Kafka. Also, I'm not
> > familiar with log.stash internals, perhaps it can be tuned to send the
> data
> > to Kafka in larger batches?
> >
> > At the moment its difficult to tell where is the slowdown. More
> information
> > about the breakdown of time will help.
> >
> > You can try Flume's SpoolingDirectory source with Kafka Channel or Sink
> and
> > see if you get improved performance out of other tools.
> >
> >
> > Gwen
> >
> > On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <cl...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I am having some log files of around 30GB, I am trying to event process
> > > these logs by pushing them to Kafka. I could clearly see the throughput
> > > achieved while publishing these event to Kafka is quiet slow.
> > >
> > > So as mentioned for the single log file of 30GB, the Logstash is
> > > continuously emitting to Kafka and it is running from more than 2 days
> > but
> > > still it has processed just 60% of the log data. I was looking out for
> a
> > > way to increase the efficiency of the publishing the event to kafka as
> > with
> > > this rate of data ingestion I don't think it will be a good option to
> > move
> > > ahead.
> > >
> > > Looking out for performance improvisation for the same.
> > >
> > > Experts advise required!
> > >
> > > Thanks!
> > >
> >
>

Re: High Latency in Kafka

Posted by Vineet Mishra <cl...@gmail.com>.
Hi Gwen,

Well I have gone through this link while trying to setup my Logstash Kafka
handler,

https://github.com/joekiller/logstash-kafka

I could achieve what I was looking for but the performance is badly
affected while trying to write a big file of GB's.
I guess there should be some way so as to parallelise the existing running
process.

Thanks!

On Sun, Feb 8, 2015 at 8:06 PM, Gwen Shapira <gs...@cloudera.com> wrote:

> I'm wondering how much of the time is spent by Logstash reading and
> processing the log vs. time spent sending data to Kafka. Also, I'm not
> familiar with log.stash internals, perhaps it can be tuned to send the data
> to Kafka in larger batches?
>
> At the moment its difficult to tell where is the slowdown. More information
> about the breakdown of time will help.
>
> You can try Flume's SpoolingDirectory source with Kafka Channel or Sink and
> see if you get improved performance out of other tools.
>
>
> Gwen
>
> On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <cl...@gmail.com>
> wrote:
>
> > Hi All,
> >
> > I am having some log files of around 30GB, I am trying to event process
> > these logs by pushing them to Kafka. I could clearly see the throughput
> > achieved while publishing these event to Kafka is quiet slow.
> >
> > So as mentioned for the single log file of 30GB, the Logstash is
> > continuously emitting to Kafka and it is running from more than 2 days
> but
> > still it has processed just 60% of the log data. I was looking out for a
> > way to increase the efficiency of the publishing the event to kafka as
> with
> > this rate of data ingestion I don't think it will be a good option to
> move
> > ahead.
> >
> > Looking out for performance improvisation for the same.
> >
> > Experts advise required!
> >
> > Thanks!
> >
>

Re: High Latency in Kafka

Posted by Gwen Shapira <gs...@cloudera.com>.
I'm wondering how much of the time is spent by Logstash reading and
processing the log vs. time spent sending data to Kafka. Also, I'm not
familiar with log.stash internals, perhaps it can be tuned to send the data
to Kafka in larger batches?

At the moment its difficult to tell where is the slowdown. More information
about the breakdown of time will help.

You can try Flume's SpoolingDirectory source with Kafka Channel or Sink and
see if you get improved performance out of other tools.


Gwen

On Sun, Feb 8, 2015 at 12:06 AM, Vineet Mishra <cl...@gmail.com>
wrote:

> Hi All,
>
> I am having some log files of around 30GB, I am trying to event process
> these logs by pushing them to Kafka. I could clearly see the throughput
> achieved while publishing these event to Kafka is quiet slow.
>
> So as mentioned for the single log file of 30GB, the Logstash is
> continuously emitting to Kafka and it is running from more than 2 days but
> still it has processed just 60% of the log data. I was looking out for a
> way to increase the efficiency of the publishing the event to kafka as with
> this rate of data ingestion I don't think it will be a good option to move
> ahead.
>
> Looking out for performance improvisation for the same.
>
> Experts advise required!
>
> Thanks!
>