You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Jason Huang <ja...@icare.com> on 2013/02/19 12:28:20 UTC

log file flush?

Hello,

I am confused about "log file flush". In my naive understanding, once
a message is produced and sent to the kafka server, it will be written
to the hard drive at the log file. Since it is in the hard drive
already, what exactly do you mean by "log file flush"?

I asked because we found that if we manually kill the zookeeper and
kafka server processes, the messages stored in the log file will be
lost. Is this expected behavior? Is there any setting to allow us keep
all the existing messages once they are written to the log file?

thanks,

Jason

Re: log file flush?

Posted by Jason Huang <ja...@icare.com>.
Very detailed and clear explanation.

Thanks a lot!

Jason

On Tue, Feb 19, 2013 at 11:28 PM, Jay Kreps <ja...@gmail.com> wrote:
> Yes, exactly. Here is the full story:
>
> When you restart kafka it checks if a clean shutdown was executed on
> the log (which would have left a marker file), if the shutdown was
> clean it assumes the log was fully flushed and uses it as is. If not
> (as in the case of a hard kill or machine crash) it executes recovery
> on the log. The recovery process validates the CRC of each message in
> the unflushed portion of the log and truncates the log to eliminate
> any partial writes that may have occurred while the server was killed.
> This process guarantees that only valid messages remain. There are
> actually a lot of corner cases in the case of a hard crash, depending
> on the OS/FS, you can also get random corrupt blocks so this process
> handles that case as well. In the case that you outline this would
> mean that the log would contain the 100 messages flushed to disk
> (assuming the last message was fully written) but not (obviously) the
> 50 messages only in RAM.
>
> That all obviously describes the unreplicated case in 0.7.x. In 0.8
> you have the option of having a replication factor with each topic,
> and so you only would lose the 50 messages in pagecache if you lost
> ALL the replicas. If you had another in-sync surviving replica then
> when the server came back up it would resync with the new leader who
> would have the full log and there would be no loss of committed
> messages.
>
> -Jay
>
>
> On Tue, Feb 19, 2013 at 8:03 PM, Jason Huang <ja...@icare.com> wrote:
>> This starts to make sense to me.
>>
>> So a log segment file (000000000.log) may have some messages that's in
>> local filesystem hard drive, some messages that's in pagecache? Say if
>> a 0000000.log file has 150 messages and the first 100 has been flushed
>> to local hard drive and the last 50 is still in the pagecache, what
>> would happen if there is machine crash? Then when we restart the
>> server, we will see the 000000.log file with only 100 messages in it?
>>
>> Thanks,
>>
>> Jason
>>
>> On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <ja...@gmail.com> wrote:
>>> To be clear: to lose data in the filesystem you need to hard kill the
>>> machine. A hard kill of the process will not cause that.
>>>
>>> -Jay
>>>
>>> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <ju...@gmail.com> wrote:
>>>> Jason,
>>>>
>>>> Although messages are always written to the log segment file, they
>>>> initially are only in the file system's pagecache. As Swapnil mentioned
>>>> earlier, messages are flushed to disk periodically. If you do a clean
>>>> shutdown (kill -15), we close all log file, which should flush all dirty
>>>> data to disk. If you do a hard kill or your machine just crashed, the
>>>> unflushed data may be lost. The data that you saw in the .log file can be
>>>> just in the pagecache.
>>>>
>>>> Thanks,
>>>>
>>>> Jun
>>>>
>>>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <ja...@icare.com> wrote:
>>>>
>>>>> Thanks for response.
>>>>>
>>>>> My confusion is that - once I see the message content in the .log
>>>>> file, doesn't that mean the message has already been flushed to the
>>>>> hard drive? Why would those messages still get lost if someone
>>>>> manually kill the process (or if the server crashes unexpectedly)?
>>>>>
>>>>> Jason
>>>>>
>>>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com>
>>>>> wrote:
>>>>> > Correction - The flush happens based on *number of messages* and time
>>>>> > limits, whichever is hit first.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
>>>>> >
>>>>> >>The flush happens based on size and time limits,
>>>>> >>whichever is hit first.
>>>>> >
>>>>>

Re: log file flush?

Posted by Jay Kreps <ja...@gmail.com>.
Yes, exactly. Here is the full story:

When you restart kafka it checks if a clean shutdown was executed on
the log (which would have left a marker file), if the shutdown was
clean it assumes the log was fully flushed and uses it as is. If not
(as in the case of a hard kill or machine crash) it executes recovery
on the log. The recovery process validates the CRC of each message in
the unflushed portion of the log and truncates the log to eliminate
any partial writes that may have occurred while the server was killed.
This process guarantees that only valid messages remain. There are
actually a lot of corner cases in the case of a hard crash, depending
on the OS/FS, you can also get random corrupt blocks so this process
handles that case as well. In the case that you outline this would
mean that the log would contain the 100 messages flushed to disk
(assuming the last message was fully written) but not (obviously) the
50 messages only in RAM.

That all obviously describes the unreplicated case in 0.7.x. In 0.8
you have the option of having a replication factor with each topic,
and so you only would lose the 50 messages in pagecache if you lost
ALL the replicas. If you had another in-sync surviving replica then
when the server came back up it would resync with the new leader who
would have the full log and there would be no loss of committed
messages.

-Jay


On Tue, Feb 19, 2013 at 8:03 PM, Jason Huang <ja...@icare.com> wrote:
> This starts to make sense to me.
>
> So a log segment file (000000000.log) may have some messages that's in
> local filesystem hard drive, some messages that's in pagecache? Say if
> a 0000000.log file has 150 messages and the first 100 has been flushed
> to local hard drive and the last 50 is still in the pagecache, what
> would happen if there is machine crash? Then when we restart the
> server, we will see the 000000.log file with only 100 messages in it?
>
> Thanks,
>
> Jason
>
> On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <ja...@gmail.com> wrote:
>> To be clear: to lose data in the filesystem you need to hard kill the
>> machine. A hard kill of the process will not cause that.
>>
>> -Jay
>>
>> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <ju...@gmail.com> wrote:
>>> Jason,
>>>
>>> Although messages are always written to the log segment file, they
>>> initially are only in the file system's pagecache. As Swapnil mentioned
>>> earlier, messages are flushed to disk periodically. If you do a clean
>>> shutdown (kill -15), we close all log file, which should flush all dirty
>>> data to disk. If you do a hard kill or your machine just crashed, the
>>> unflushed data may be lost. The data that you saw in the .log file can be
>>> just in the pagecache.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <ja...@icare.com> wrote:
>>>
>>>> Thanks for response.
>>>>
>>>> My confusion is that - once I see the message content in the .log
>>>> file, doesn't that mean the message has already been flushed to the
>>>> hard drive? Why would those messages still get lost if someone
>>>> manually kill the process (or if the server crashes unexpectedly)?
>>>>
>>>> Jason
>>>>
>>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com>
>>>> wrote:
>>>> > Correction - The flush happens based on *number of messages* and time
>>>> > limits, whichever is hit first.
>>>> >
>>>> >
>>>> >
>>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
>>>> >
>>>> >>The flush happens based on size and time limits,
>>>> >>whichever is hit first.
>>>> >
>>>>

Re: log file flush?

Posted by Jason Huang <ja...@icare.com>.
This starts to make sense to me.

So a log segment file (000000000.log) may have some messages that's in
local filesystem hard drive, some messages that's in pagecache? Say if
a 0000000.log file has 150 messages and the first 100 has been flushed
to local hard drive and the last 50 is still in the pagecache, what
would happen if there is machine crash? Then when we restart the
server, we will see the 000000.log file with only 100 messages in it?

Thanks,

Jason

On Wed, Feb 20, 2013 at 1:59 AM, Jay Kreps <ja...@gmail.com> wrote:
> To be clear: to lose data in the filesystem you need to hard kill the
> machine. A hard kill of the process will not cause that.
>
> -Jay
>
> On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <ju...@gmail.com> wrote:
>> Jason,
>>
>> Although messages are always written to the log segment file, they
>> initially are only in the file system's pagecache. As Swapnil mentioned
>> earlier, messages are flushed to disk periodically. If you do a clean
>> shutdown (kill -15), we close all log file, which should flush all dirty
>> data to disk. If you do a hard kill or your machine just crashed, the
>> unflushed data may be lost. The data that you saw in the .log file can be
>> just in the pagecache.
>>
>> Thanks,
>>
>> Jun
>>
>> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <ja...@icare.com> wrote:
>>
>>> Thanks for response.
>>>
>>> My confusion is that - once I see the message content in the .log
>>> file, doesn't that mean the message has already been flushed to the
>>> hard drive? Why would those messages still get lost if someone
>>> manually kill the process (or if the server crashes unexpectedly)?
>>>
>>> Jason
>>>
>>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com>
>>> wrote:
>>> > Correction - The flush happens based on *number of messages* and time
>>> > limits, whichever is hit first.
>>> >
>>> >
>>> >
>>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
>>> >
>>> >>The flush happens based on size and time limits,
>>> >>whichever is hit first.
>>> >
>>>

Re: log file flush?

Posted by Jay Kreps <ja...@gmail.com>.
To be clear: to lose data in the filesystem you need to hard kill the
machine. A hard kill of the process will not cause that.

-Jay

On Tue, Feb 19, 2013 at 8:25 AM, Jun Rao <ju...@gmail.com> wrote:
> Jason,
>
> Although messages are always written to the log segment file, they
> initially are only in the file system's pagecache. As Swapnil mentioned
> earlier, messages are flushed to disk periodically. If you do a clean
> shutdown (kill -15), we close all log file, which should flush all dirty
> data to disk. If you do a hard kill or your machine just crashed, the
> unflushed data may be lost. The data that you saw in the .log file can be
> just in the pagecache.
>
> Thanks,
>
> Jun
>
> On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <ja...@icare.com> wrote:
>
>> Thanks for response.
>>
>> My confusion is that - once I see the message content in the .log
>> file, doesn't that mean the message has already been flushed to the
>> hard drive? Why would those messages still get lost if someone
>> manually kill the process (or if the server crashes unexpectedly)?
>>
>> Jason
>>
>> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com>
>> wrote:
>> > Correction - The flush happens based on *number of messages* and time
>> > limits, whichever is hit first.
>> >
>> >
>> >
>> > On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
>> >
>> >>The flush happens based on size and time limits,
>> >>whichever is hit first.
>> >
>>

Re: log file flush?

Posted by Jun Rao <ju...@gmail.com>.
Jason,

Although messages are always written to the log segment file, they
initially are only in the file system's pagecache. As Swapnil mentioned
earlier, messages are flushed to disk periodically. If you do a clean
shutdown (kill -15), we close all log file, which should flush all dirty
data to disk. If you do a hard kill or your machine just crashed, the
unflushed data may be lost. The data that you saw in the .log file can be
just in the pagecache.

Thanks,

Jun

On Tue, Feb 19, 2013 at 4:05 AM, Jason Huang <ja...@icare.com> wrote:

> Thanks for response.
>
> My confusion is that - once I see the message content in the .log
> file, doesn't that mean the message has already been flushed to the
> hard drive? Why would those messages still get lost if someone
> manually kill the process (or if the server crashes unexpectedly)?
>
> Jason
>
> On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com>
> wrote:
> > Correction - The flush happens based on *number of messages* and time
> > limits, whichever is hit first.
> >
> >
> >
> > On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
> >
> >>The flush happens based on size and time limits,
> >>whichever is hit first.
> >
>

Re: log file flush?

Posted by Jason Huang <ja...@icare.com>.
Thanks for response.

My confusion is that - once I see the message content in the .log
file, doesn't that mean the message has already been flushed to the
hard drive? Why would those messages still get lost if someone
manually kill the process (or if the server crashes unexpectedly)?

Jason

On Tue, Feb 19, 2013 at 6:53 AM, Swapnil Ghike <sg...@linkedin.com> wrote:
> Correction - The flush happens based on *number of messages* and time
> limits, whichever is hit first.
>
>
>
> On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:
>
>>The flush happens based on size and time limits,
>>whichever is hit first.
>

Re: log file flush?

Posted by Swapnil Ghike <sg...@linkedin.com>.
Correction - The flush happens based on *number of messages* and time
limits, whichever is hit first.



On 2/19/13 3:50 AM, "Swapnil Ghike" <sg...@linkedin.com> wrote:

>The flush happens based on size and time limits,
>whichever is hit first.


Re: log file flush?

Posted by Swapnil Ghike <sg...@linkedin.com>.
The messages for a topic are kept in the kafka broker's memory before they
are flushed to the disk. The flush happens based on size and time limits,
whichever is hit first. If you kill the kafka server process before any
message has been flushed to the disk, those messages will be lost. The
config (kafka.server.KafkaConfig) parameters log.flush.interval,
log.default.flush.scheduler.interval.ms and log.default.flush.interval.ms
at http://kafka.apache.org/configuration.html should help clarify this.

Thanks,
Swapnil

On 2/19/13 3:28 AM, "Jason Huang" <ja...@icare.com> wrote:

>Hello,
>
>I am confused about "log file flush". In my naive understanding, once
>a message is produced and sent to the kafka server, it will be written
>to the hard drive at the log file. Since it is in the hard drive
>already, what exactly do you mean by "log file flush"?
>
>I asked because we found that if we manually kill the zookeeper and
>kafka server processes, the messages stored in the log file will be
>lost. Is this expected behavior? Is there any setting to allow us keep
>all the existing messages once they are written to the log file?
>
>thanks,
>
>Jason