You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Hemanth Yamijala <yh...@gmail.com> on 2014/09/10 13:59:34 UTC

Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Hi folks,

In order to meet latency requirements for a system we are building, we
tested with different values of the above two parameters and found that
settings as low as 100 work best for us, balancing the required throughput
and latencies.

I just wanted to check if 100 is a sane value, notwithstanding we are
getting good results in our tests, anything we need to be aware of while
setting to low values like this (apart from the throughput, which we see is
OK for us) ?

Any experience reports will help.

Thanks
Hemanth

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Hemanth Yamijala <yh...@gmail.com>.

Neha,

I got some time today to test the JMX values you mentioned. My test setup
includes a Kafka broker, a PHP based load generator that generates messages
at a steady rate, and a Storm Kafka consumer. The PHP load generator
continuously pumps in a standard 3KB message with a sleep interval of 100
microseconds.

I tested with two values for the log.default.flush.interval.ms: 1000ms and
100ms.

At the same time, I had a JMX client (
http://crawler.archive.org/cmdline-jmxclient/) which was monitoring the
suggested attributes every second.

At a high level, I observe that the JMX values don't change for a
reasonably long time - maybe for like 10 minutes, and then register a
change. Is this something to do with the update interval of the JMX stats
on Kafka's side and is that something that I can change ?

Apart from that, these are the numbers I received by running the tests for
about 20-30 minutes each:

With the flush interval set to 1000ms:
============================
9/16/2014 16:37:24 +0530 org.archive.jmx.Client MaxProduceRequestMs:
149.468028
9/16/2014 16:47:25 +0530 org.archive.jmx.Client MaxProduceRequestMs:
150.073965
9/16/2014 16:57:36 +0530 org.archive.jmx.Client MaxProduceRequestMs:
139.673677

9/16/2014 16:45:13 +0530 org.archive.jmx.Client MaxFlushMs: 151.0 (stayed
fairly constant)

With the flush interval set to 100ms:
===========================
9/16/2014 17:14:27 +0530 org.archive.jmx.Client MaxProduceRequestMs:
62.854318
9/16/2014 17:24:31 +0530 org.archive.jmx.Client MaxProduceRequestMs:
72.683045
9/16/2014 17:34:30 +0530 org.archive.jmx.Client MaxProduceRequestMs:
75.267653

9/16/2014 17:16:35 +0530 org.archive.jmx.Client MaxFlushMs: 65.0
9/16/2014 17:29:57 +0530 org.archive.jmx.Client MaxFlushMs: 75.0

I can see that the overall values are lower in the case of the lower flush
interval, but they seem to be going up steadily. Is this something to
monitor more closely over a longer period of time ?

Can you please help me to interpret these results ?

Thanks
hemanth


On Mon, Sep 15, 2014 at 1:58 PM, Hemanth Yamijala <yh...@gmail.com>
wrote:

> Thanks Neha and Jun for the pointers. We will try and evaluate this as
> well.
>
> Hemanth
>
> On Sat, Sep 13, 2014 at 4:45 AM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
>> Hemanth,
>>
>> Specifically, you'd want to monitor
>> kafka:type=kafka.SocketServerStats:getMaxProduceRequestMs and
>> kafka:type=kafka.LogFlushStats:getMaxFlushMs. If the broker is under load
>> due to frequent flushes, it will almost certainly show up as spikes in the
>> flush latency and consequently the produce request latency. A side effect
>> of that is that your producer queue will back up and your producer will
>> eventually lose data.
>>
>> Thanks,
>> Neha
>>
>> On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala <yh...@gmail.com>
>> wrote:
>>
>> > Neha,
>> >
>> > Thanks. We are on 0.7.2. I have written on another thread on the list
>> here
>> > about one of the reasons we are stuck - the absence of a PHP client for
>> our
>> > front end producer systems. (On a side note, would appreciate if any
>> inputs
>> > can be given on that thread as well)
>> >
>> > When you mean performance, do you mean throughput ? We did measure
>> > throughput with our default configuration of 1000 ms for the flush
>> interval
>> > value, and the much lower 100 ms value I proposed on this thread. Our
>> > numbers were identical - for a single broker we were clocking at around
>> > 20,000 messages read per second on the consumer side. Using a small 'n'
>> > brokers we can easily exceed our target numbers. (The load was
>> > synthetically generated - using a likely message size and at a rate that
>> > seems reasonable for our producing side).
>> >
>> > Given this observation, do you suggest any further tests / measurements
>> for
>> > us to be sure ? Would appreciate any inputs.
>> >
>> > Thanks
>> > Hemanth
>> >
>> > On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede <neha.narkhede@gmail.com
>> >
>> > wrote:
>> >
>> > > I should mention that the impact of doing so is much higher wrt to
>> > taking a
>> > > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
>> > > later, it should mostly be fine. You might want to keep a close tab on
>> > how
>> > > your iostat numbers are doing, to be sure.
>> > >
>> > > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yhemanth@gmail.com
>> >
>> > > wrote:
>> > >
>> > > > Thanks Jun.
>> > > >
>> > > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
>> > > >
>> > > > > As long as the I/O load is reasonable, this is probably ok.
>> > > > >
>> > > > > Thanks,
>> > > > >
>> > > > > Jun
>> > > > >
>> > > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
>> > yhemanth@gmail.com>
>> > > > > wrote:
>> > > > >
>> > > > > > Hi folks,
>> > > > > >
>> > > > > > In order to meet latency requirements for a system we are
>> building,
>> > > we
>> > > > > > tested with different values of the above two parameters and
>> found
>> > > that
>> > > > > > settings as low as 100 work best for us, balancing the required
>> > > > > throughput
>> > > > > > and latencies.
>> > > > > >
>> > > > > > I just wanted to check if 100 is a sane value, notwithstanding
>> we
>> > are
>> > > > > > getting good results in our tests, anything we need to be aware
>> of
>> > > > while
>> > > > > > setting to low values like this (apart from the throughput,
>> which
>> > we
>> > > > see
>> > > > > is
>> > > > > > OK for us) ?
>> > > > > >
>> > > > > > Any experience reports will help.
>> > > > > >
>> > > > > > Thanks
>> > > > > > Hemanth
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Hemanth Yamijala <yh...@gmail.com>.

Thanks Neha and Jun for the pointers. We will try and evaluate this as well.

Hemanth

On Sat, Sep 13, 2014 at 4:45 AM, Neha Narkhede <ne...@gmail.com>
wrote:

> Hemanth,
>
> Specifically, you'd want to monitor
> kafka:type=kafka.SocketServerStats:getMaxProduceRequestMs and
> kafka:type=kafka.LogFlushStats:getMaxFlushMs. If the broker is under load
> due to frequent flushes, it will almost certainly show up as spikes in the
> flush latency and consequently the produce request latency. A side effect
> of that is that your producer queue will back up and your producer will
> eventually lose data.
>
> Thanks,
> Neha
>
> On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>
> > Neha,
> >
> > Thanks. We are on 0.7.2. I have written on another thread on the list
> here
> > about one of the reasons we are stuck - the absence of a PHP client for
> our
> > front end producer systems. (On a side note, would appreciate if any
> inputs
> > can be given on that thread as well)
> >
> > When you mean performance, do you mean throughput ? We did measure
> > throughput with our default configuration of 1000 ms for the flush
> interval
> > value, and the much lower 100 ms value I proposed on this thread. Our
> > numbers were identical - for a single broker we were clocking at around
> > 20,000 messages read per second on the consumer side. Using a small 'n'
> > brokers we can easily exceed our target numbers. (The load was
> > synthetically generated - using a likely message size and at a rate that
> > seems reasonable for our producing side).
> >
> > Given this observation, do you suggest any further tests / measurements
> for
> > us to be sure ? Would appreciate any inputs.
> >
> > Thanks
> > Hemanth
> >
> > On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede <ne...@gmail.com>
> > wrote:
> >
> > > I should mention that the impact of doing so is much higher wrt to
> > taking a
> > > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> > > later, it should mostly be fine. You might want to keep a close tab on
> > how
> > > your iostat numbers are doing, to be sure.
> > >
> > > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yh...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Jun.
> > > >
> > > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > As long as the I/O load is reasonable, this is probably ok.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
> > yhemanth@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi folks,
> > > > > >
> > > > > > In order to meet latency requirements for a system we are
> building,
> > > we
> > > > > > tested with different values of the above two parameters and
> found
> > > that
> > > > > > settings as low as 100 work best for us, balancing the required
> > > > > throughput
> > > > > > and latencies.
> > > > > >
> > > > > > I just wanted to check if 100 is a sane value, notwithstanding we
> > are
> > > > > > getting good results in our tests, anything we need to be aware
> of
> > > > while
> > > > > > setting to low values like this (apart from the throughput, which
> > we
> > > > see
> > > > > is
> > > > > > OK for us) ?
> > > > > >
> > > > > > Any experience reports will help.
> > > > > >
> > > > > > Thanks
> > > > > > Hemanth
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Neha Narkhede <ne...@gmail.com>.

Hemanth,

Specifically, you'd want to monitor
kafka:type=kafka.SocketServerStats:getMaxProduceRequestMs and
kafka:type=kafka.LogFlushStats:getMaxFlushMs. If the broker is under load
due to frequent flushes, it will almost certainly show up as spikes in the
flush latency and consequently the produce request latency. A side effect
of that is that your producer queue will back up and your producer will
eventually lose data.

Thanks,
Neha

On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala <yh...@gmail.com>
wrote:

> Neha,
>
> Thanks. We are on 0.7.2. I have written on another thread on the list here
> about one of the reasons we are stuck - the absence of a PHP client for our
> front end producer systems. (On a side note, would appreciate if any inputs
> can be given on that thread as well)
>
> When you mean performance, do you mean throughput ? We did measure
> throughput with our default configuration of 1000 ms for the flush interval
> value, and the much lower 100 ms value I proposed on this thread. Our
> numbers were identical - for a single broker we were clocking at around
> 20,000 messages read per second on the consumer side. Using a small 'n'
> brokers we can easily exceed our target numbers. (The load was
> synthetically generated - using a likely message size and at a rate that
> seems reasonable for our producing side).
>
> Given this observation, do you suggest any further tests / measurements for
> us to be sure ? Would appreciate any inputs.
>
> Thanks
> Hemanth
>
> On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
> > I should mention that the impact of doing so is much higher wrt to
> taking a
> > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> > later, it should mostly be fine. You might want to keep a close tab on
> how
> > your iostat numbers are doing, to be sure.
> >
> > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >
> > > Thanks Jun.
> > >
> > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > As long as the I/O load is reasonable, this is probably ok.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
> yhemanth@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In order to meet latency requirements for a system we are building,
> > we
> > > > > tested with different values of the above two parameters and found
> > that
> > > > > settings as low as 100 work best for us, balancing the required
> > > > throughput
> > > > > and latencies.
> > > > >
> > > > > I just wanted to check if 100 is a sane value, notwithstanding we
> are
> > > > > getting good results in our tests, anything we need to be aware of
> > > while
> > > > > setting to low values like this (apart from the throughput, which
> we
> > > see
> > > > is
> > > > > OK for us) ?
> > > > >
> > > > > Any experience reports will help.
> > > > >
> > > > > Thanks
> > > > > Hemanth
> > > > >
> > > >
> > >
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Jun Rao <ju...@gmail.com>.

One of the differences btw 0.7.x and 0.8.x is that the latter does the I/O
flushing in the background. So, in 0.7.x, more frequent I/O flushing will
increase the producer latency.

Thanks,

Jun

On Thu, Sep 11, 2014 at 5:48 PM, Hemanth Yamijala <yh...@gmail.com>
wrote:

> Neha,
>
> Thanks. We are on 0.7.2. I have written on another thread on the list here
> about one of the reasons we are stuck - the absence of a PHP client for our
> front end producer systems. (On a side note, would appreciate if any inputs
> can be given on that thread as well)
>
> When you mean performance, do you mean throughput ? We did measure
> throughput with our default configuration of 1000 ms for the flush interval
> value, and the much lower 100 ms value I proposed on this thread. Our
> numbers were identical - for a single broker we were clocking at around
> 20,000 messages read per second on the consumer side. Using a small 'n'
> brokers we can easily exceed our target numbers. (The load was
> synthetically generated - using a likely message size and at a rate that
> seems reasonable for our producing side).
>
> Given this observation, do you suggest any further tests / measurements for
> us to be sure ? Would appreciate any inputs.
>
> Thanks
> Hemanth
>
> On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede <ne...@gmail.com>
> wrote:
>
> > I should mention that the impact of doing so is much higher wrt to
> taking a
> > hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> > later, it should mostly be fine. You might want to keep a close tab on
> how
> > your iostat numbers are doing, to be sure.
> >
> > On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >
> > > Thanks Jun.
> > >
> > > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > As long as the I/O load is reasonable, this is probably ok.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <
> yhemanth@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi folks,
> > > > >
> > > > > In order to meet latency requirements for a system we are building,
> > we
> > > > > tested with different values of the above two parameters and found
> > that
> > > > > settings as low as 100 work best for us, balancing the required
> > > > throughput
> > > > > and latencies.
> > > > >
> > > > > I just wanted to check if 100 is a sane value, notwithstanding we
> are
> > > > > getting good results in our tests, anything we need to be aware of
> > > while
> > > > > setting to low values like this (apart from the throughput, which
> we
> > > see
> > > > is
> > > > > OK for us) ?
> > > > >
> > > > > Any experience reports will help.
> > > > >
> > > > > Thanks
> > > > > Hemanth
> > > > >
> > > >
> > >
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Hemanth Yamijala <yh...@gmail.com>.

Neha,

Thanks. We are on 0.7.2. I have written on another thread on the list here
about one of the reasons we are stuck - the absence of a PHP client for our
front end producer systems. (On a side note, would appreciate if any inputs
can be given on that thread as well)

When you mean performance, do you mean throughput ? We did measure
throughput with our default configuration of 1000 ms for the flush interval
value, and the much lower 100 ms value I proposed on this thread. Our
numbers were identical - for a single broker we were clocking at around
20,000 messages read per second on the consumer side. Using a small 'n'
brokers we can easily exceed our target numbers. (The load was
synthetically generated - using a likely message size and at a rate that
seems reasonable for our producing side).

Given this observation, do you suggest any further tests / measurements for
us to be sure ? Would appreciate any inputs.

Thanks
Hemanth

On Fri, Sep 12, 2014 at 1:32 AM, Neha Narkhede <ne...@gmail.com>
wrote:

> I should mention that the impact of doing so is much higher wrt to taking a
> hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
> later, it should mostly be fine. You might want to keep a close tab on how
> your iostat numbers are doing, to be sure.
>
> On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>
> > Thanks Jun.
> >
> > On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > As long as the I/O load is reasonable, this is probably ok.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <yh...@gmail.com>
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > In order to meet latency requirements for a system we are building,
> we
> > > > tested with different values of the above two parameters and found
> that
> > > > settings as low as 100 work best for us, balancing the required
> > > throughput
> > > > and latencies.
> > > >
> > > > I just wanted to check if 100 is a sane value, notwithstanding we are
> > > > getting good results in our tests, anything we need to be aware of
> > while
> > > > setting to low values like this (apart from the throughput, which we
> > see
> > > is
> > > > OK for us) ?
> > > >
> > > > Any experience reports will help.
> > > >
> > > > Thanks
> > > > Hemanth
> > > >
> > >
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Neha Narkhede <ne...@gmail.com>.

I should mention that the impact of doing so is much higher wrt to taking a
hit on performance, on versions < 0.8.1. As long as you're on 0.8.1 or
later, it should mostly be fine. You might want to keep a close tab on how
your iostat numbers are doing, to be sure.

On Wed, Sep 10, 2014 at 5:46 PM, Hemanth Yamijala <yh...@gmail.com>
wrote:

> Thanks Jun.
>
> On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > As long as the I/O load is reasonable, this is probably ok.
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <yh...@gmail.com>
> > wrote:
> >
> > > Hi folks,
> > >
> > > In order to meet latency requirements for a system we are building, we
> > > tested with different values of the above two parameters and found that
> > > settings as low as 100 work best for us, balancing the required
> > throughput
> > > and latencies.
> > >
> > > I just wanted to check if 100 is a sane value, notwithstanding we are
> > > getting good results in our tests, anything we need to be aware of
> while
> > > setting to low values like this (apart from the throughput, which we
> see
> > is
> > > OK for us) ?
> > >
> > > Any experience reports will help.
> > >
> > > Thanks
> > > Hemanth
> > >
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Hemanth Yamijala <yh...@gmail.com>.

Thanks Jun.

On Thu, Sep 11, 2014 at 4:13 AM, Jun Rao <ju...@gmail.com> wrote:

> As long as the I/O load is reasonable, this is probably ok.
>
> Thanks,
>
> Jun
>
> On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <yh...@gmail.com>
> wrote:
>
> > Hi folks,
> >
> > In order to meet latency requirements for a system we are building, we
> > tested with different values of the above two parameters and found that
> > settings as low as 100 work best for us, balancing the required
> throughput
> > and latencies.
> >
> > I just wanted to check if 100 is a sane value, notwithstanding we are
> > getting good results in our tests, anything we need to be aware of while
> > setting to low values like this (apart from the throughput, which we see
> is
> > OK for us) ?
> >
> > Any experience reports will help.
> >
> > Thanks
> > Hemanth
> >
>

Re: Setting log.default.flush.interval.ms and log.default.flush.scheduler.interval.ms

Posted by Jun Rao <ju...@gmail.com>.

As long as the I/O load is reasonable, this is probably ok.

Thanks,

Jun

On Wed, Sep 10, 2014 at 4:59 AM, Hemanth Yamijala <yh...@gmail.com>
wrote:

> Hi folks,
>
> In order to meet latency requirements for a system we are building, we
> tested with different values of the above two parameters and found that
> settings as low as 100 work best for us, balancing the required throughput
> and latencies.
>
> I just wanted to check if 100 is a sane value, notwithstanding we are
> getting good results in our tests, anything we need to be aware of while
> setting to low values like this (apart from the throughput, which we see is
> OK for us) ?
>
> Any experience reports will help.
>
> Thanks
> Hemanth
>