You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Monika Garg <ga...@gmail.com> on 2013/10/11 08:20:39 UTC

Fwd: Messages from producer are immediately going to /tmp/logs in kafka

Hi,

In kafka-0.8 there are three important properties given for

log.flush.interval.messages=10000

log.flush.interval.ms=900000

log.flush.scheduler.interval.ms=900000

I have set the above properties as I have mentioned above.Then I started
Kafka Console Producer given with kafka bundle-0.8 and gave some
messages.The message are going to log partitions of given topic immediately.

I am confused why the messages are flushing to /tmp/logs immediately,They
should wait as per log.flush.interval.messages=
10000 or  log.flush.interval.ms=900000.

Please check.

-- 
*Moniii*



-- 
*Moniii*

Re: Messages from producer are immediately going to /tmp/logs in kafka

Posted by Monika Garg <ga...@gmail.com>.
Thanks for clearing my doubt Jay...

I was mixing the kafka log.flush policy with that of OS in a sense that is
explained below:

I read the below property of Kafka

 log.flush.interval.messages -

The number of messages written to a log partition before we force an fsync
on the log.

So I thought it is only kafka who is doing fSync on data in partitions in
order to flush them to disk.I hadn't gone through the docs you mentioned(in
the operations section of kafka docs).But now I have gone through it and it
cleared my doubt for flushing data to disk...:-)

Thanks a lot..:-)




On Tue, Oct 15, 2013 at 3:58 AM, Joris VanRemoortere <
jvanremoortere@tagged.com> wrote:

> I also filed this bug + patch (
> https://issues.apache.org/jira/browse/KAFKA-1042) where the config
> variables you mention are not actually enforced within an active segment.
> It is related, but the reverse of what you are looking for.
> The flush intervals are not enforced properly; however, the filesystem will
> still flush at it's own pace (linux fs), regardless of the intervals
> specified in the config.
> *The crux*: *The config intervals are more useful (after a bug fix)
> forflushing more often, not less often.
> *
> *
> *
> Joris
>
>
> On Mon, Oct 14, 2013 at 10:28 AM, Jay Kreps <ja...@gmail.com> wrote:
>
> > I believe this is the first complaint we have got on a lack of data loss.
> > :-)
> >
> > The behavior of kafka is to immediately write all messages to the
> > filesystem. The operating system will sync the file to disk at its own
> pace
> > (we give some docs on how linux does it in our operations section in the
> > kafka docs and this is pretty well documented on the internet). As the
> docs
> > say, the configuration you are describing just controls the frequency
> with
> > which kafka forces an fsync and has nothing to do with writing to the fs
> > (which is always immediate). Fysnc makes the os write the data in its
> cache
> > to physical disk.
> >
> > This makes forcing message loss a little hard.  Killing the process won't
> > work because the data is not stored in the application memory it is in
> the
> > filesystem cache. Shutting down the machine will not cause this as the OS
> > flushes the data to disk before shutting down. If you want to force data
> > loss I think you need to yank the plug on the machine immediately after a
> > write but prior to both an application level fsync and the OS's own flush
> > policy.
> >
> > -Jay
> >
> > -Jay
> >
> >
> > On Mon, Oct 14, 2013 at 10:00 AM, Monika Garg <ga...@gmail.com>
> > wrote:
> >
> > > Thanks for replying Jun.
> > >
> > > I also thought the same.
> > > But I got the same messages in my /kafka/logs dir  even after rebooting
> > my
> > > machine in less than the time given by  log.flush.interval.ms=900000.
> > > So can you please suggest me any way to check that messages are
> actually
> > > loosing after machine shutdown?
> > >
> > >
> > > On Fri, Oct 11, 2013 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Those messages could still be in file system pagecache and may not be
> > > > flushed to disks.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Thu, Oct 10, 2013 at 11:20 PM, Monika Garg <ga...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > In kafka-0.8 there are three important properties given for
> > > > >
> > > > > log.flush.interval.messages=10000
> > > > >
> > > > > log.flush.interval.ms=900000
> > > > >
> > > > > log.flush.scheduler.interval.ms=900000
> > > > >
> > > > > I have set the above properties as I have mentioned above.Then I
> > > started
> > > > > Kafka Console Producer given with kafka bundle-0.8 and gave some
> > > > > messages.The message are going to log partitions of given topic
> > > > > immediately.
> > > > >
> > > > > I am confused why the messages are flushing to /tmp/logs
> > > immediately,They
> > > > > should wait as per log.flush.interval.messages=
> > > > > 10000 or  log.flush.interval.ms=900000.
> > > > >
> > > > > Please check.
> > > > >
> > > > > --
> > > > > *Moniii*
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > *Moniii*
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > *Moniii*
> > >
> >
>



-- 
*Moniii*

Re: Messages from producer are immediately going to /tmp/logs in kafka

Posted by Joris VanRemoortere <jv...@tagged.com>.
I also filed this bug + patch (
https://issues.apache.org/jira/browse/KAFKA-1042) where the config
variables you mention are not actually enforced within an active segment.
It is related, but the reverse of what you are looking for.
The flush intervals are not enforced properly; however, the filesystem will
still flush at it's own pace (linux fs), regardless of the intervals
specified in the config.
*The crux*: *The config intervals are more useful (after a bug fix)
forflushing more often, not less often.
*
*
*
Joris


On Mon, Oct 14, 2013 at 10:28 AM, Jay Kreps <ja...@gmail.com> wrote:

> I believe this is the first complaint we have got on a lack of data loss.
> :-)
>
> The behavior of kafka is to immediately write all messages to the
> filesystem. The operating system will sync the file to disk at its own pace
> (we give some docs on how linux does it in our operations section in the
> kafka docs and this is pretty well documented on the internet). As the docs
> say, the configuration you are describing just controls the frequency with
> which kafka forces an fsync and has nothing to do with writing to the fs
> (which is always immediate). Fysnc makes the os write the data in its cache
> to physical disk.
>
> This makes forcing message loss a little hard.  Killing the process won't
> work because the data is not stored in the application memory it is in the
> filesystem cache. Shutting down the machine will not cause this as the OS
> flushes the data to disk before shutting down. If you want to force data
> loss I think you need to yank the plug on the machine immediately after a
> write but prior to both an application level fsync and the OS's own flush
> policy.
>
> -Jay
>
> -Jay
>
>
> On Mon, Oct 14, 2013 at 10:00 AM, Monika Garg <ga...@gmail.com>
> wrote:
>
> > Thanks for replying Jun.
> >
> > I also thought the same.
> > But I got the same messages in my /kafka/logs dir  even after rebooting
> my
> > machine in less than the time given by  log.flush.interval.ms=900000.
> > So can you please suggest me any way to check that messages are actually
> > loosing after machine shutdown?
> >
> >
> > On Fri, Oct 11, 2013 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Those messages could still be in file system pagecache and may not be
> > > flushed to disks.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Oct 10, 2013 at 11:20 PM, Monika Garg <ga...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > In kafka-0.8 there are three important properties given for
> > > >
> > > > log.flush.interval.messages=10000
> > > >
> > > > log.flush.interval.ms=900000
> > > >
> > > > log.flush.scheduler.interval.ms=900000
> > > >
> > > > I have set the above properties as I have mentioned above.Then I
> > started
> > > > Kafka Console Producer given with kafka bundle-0.8 and gave some
> > > > messages.The message are going to log partitions of given topic
> > > > immediately.
> > > >
> > > > I am confused why the messages are flushing to /tmp/logs
> > immediately,They
> > > > should wait as per log.flush.interval.messages=
> > > > 10000 or  log.flush.interval.ms=900000.
> > > >
> > > > Please check.
> > > >
> > > > --
> > > > *Moniii*
> > > >
> > > >
> > > >
> > > > --
> > > > *Moniii*
> > > >
> > >
> >
> >
> >
> > --
> > *Moniii*
> >
>

Re: Messages from producer are immediately going to /tmp/logs in kafka

Posted by Jay Kreps <ja...@gmail.com>.
I believe this is the first complaint we have got on a lack of data loss.
:-)

The behavior of kafka is to immediately write all messages to the
filesystem. The operating system will sync the file to disk at its own pace
(we give some docs on how linux does it in our operations section in the
kafka docs and this is pretty well documented on the internet). As the docs
say, the configuration you are describing just controls the frequency with
which kafka forces an fsync and has nothing to do with writing to the fs
(which is always immediate). Fysnc makes the os write the data in its cache
to physical disk.

This makes forcing message loss a little hard.  Killing the process won't
work because the data is not stored in the application memory it is in the
filesystem cache. Shutting down the machine will not cause this as the OS
flushes the data to disk before shutting down. If you want to force data
loss I think you need to yank the plug on the machine immediately after a
write but prior to both an application level fsync and the OS's own flush
policy.

-Jay

-Jay


On Mon, Oct 14, 2013 at 10:00 AM, Monika Garg <ga...@gmail.com> wrote:

> Thanks for replying Jun.
>
> I also thought the same.
> But I got the same messages in my /kafka/logs dir  even after rebooting my
> machine in less than the time given by  log.flush.interval.ms=900000.
> So can you please suggest me any way to check that messages are actually
> loosing after machine shutdown?
>
>
> On Fri, Oct 11, 2013 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Those messages could still be in file system pagecache and may not be
> > flushed to disks.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Oct 10, 2013 at 11:20 PM, Monika Garg <ga...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > In kafka-0.8 there are three important properties given for
> > >
> > > log.flush.interval.messages=10000
> > >
> > > log.flush.interval.ms=900000
> > >
> > > log.flush.scheduler.interval.ms=900000
> > >
> > > I have set the above properties as I have mentioned above.Then I
> started
> > > Kafka Console Producer given with kafka bundle-0.8 and gave some
> > > messages.The message are going to log partitions of given topic
> > > immediately.
> > >
> > > I am confused why the messages are flushing to /tmp/logs
> immediately,They
> > > should wait as per log.flush.interval.messages=
> > > 10000 or  log.flush.interval.ms=900000.
> > >
> > > Please check.
> > >
> > > --
> > > *Moniii*
> > >
> > >
> > >
> > > --
> > > *Moniii*
> > >
> >
>
>
>
> --
> *Moniii*
>

Re: Messages from producer are immediately going to /tmp/logs in kafka

Posted by Monika Garg <ga...@gmail.com>.
Thanks for replying Jun.

I also thought the same.
But I got the same messages in my /kafka/logs dir  even after rebooting my
machine in less than the time given by  log.flush.interval.ms=900000.
So can you please suggest me any way to check that messages are actually
loosing after machine shutdown?


On Fri, Oct 11, 2013 at 8:56 PM, Jun Rao <ju...@gmail.com> wrote:

> Those messages could still be in file system pagecache and may not be
> flushed to disks.
>
> Thanks,
>
> Jun
>
>
> On Thu, Oct 10, 2013 at 11:20 PM, Monika Garg <ga...@gmail.com>
> wrote:
>
> > Hi,
> >
> > In kafka-0.8 there are three important properties given for
> >
> > log.flush.interval.messages=10000
> >
> > log.flush.interval.ms=900000
> >
> > log.flush.scheduler.interval.ms=900000
> >
> > I have set the above properties as I have mentioned above.Then I started
> > Kafka Console Producer given with kafka bundle-0.8 and gave some
> > messages.The message are going to log partitions of given topic
> > immediately.
> >
> > I am confused why the messages are flushing to /tmp/logs immediately,They
> > should wait as per log.flush.interval.messages=
> > 10000 or  log.flush.interval.ms=900000.
> >
> > Please check.
> >
> > --
> > *Moniii*
> >
> >
> >
> > --
> > *Moniii*
> >
>



-- 
*Moniii*

Re: Messages from producer are immediately going to /tmp/logs in kafka

Posted by Jun Rao <ju...@gmail.com>.
Those messages could still be in file system pagecache and may not be
flushed to disks.

Thanks,

Jun


On Thu, Oct 10, 2013 at 11:20 PM, Monika Garg <ga...@gmail.com> wrote:

> Hi,
>
> In kafka-0.8 there are three important properties given for
>
> log.flush.interval.messages=10000
>
> log.flush.interval.ms=900000
>
> log.flush.scheduler.interval.ms=900000
>
> I have set the above properties as I have mentioned above.Then I started
> Kafka Console Producer given with kafka bundle-0.8 and gave some
> messages.The message are going to log partitions of given topic
> immediately.
>
> I am confused why the messages are flushing to /tmp/logs immediately,They
> should wait as per log.flush.interval.messages=
> 10000 or  log.flush.interval.ms=900000.
>
> Please check.
>
> --
> *Moniii*
>
>
>
> --
> *Moniii*
>