You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Burroughs <ch...@gmail.com> on 2012/06/19 03:21:32 UTC

Re: Solution for blocking fsync in 0.8

Thanks Jay.  This is a very helpful investigation!

On 05/24/2012 01:40 PM, Jay Kreps wrote:
> 
> Unfortunately *any* call to fsync will block appends even in a background
> thread so how can we give control over physical disk persistence without
> introducing high latency for the producer? The answer is that the linux
> pdflush daemon actually does a very similar thing to our flush parameters.
> pdflush is a daemon running on every linux machine that controls the
> writing of buffered/cached data back to disk. It allows you to control the
> percentage of memory filled with dirty pages by giving it either a
> percentage of memory, a time out for any dirty page to be written, or a
> fixed number of dirty bytes.


This would however by necessity by a global setting right?  (Assuming
there is no /proc trickery to change per-pid pdflush behaviour)

Re: Solution for blocking fsync in 0.8

Posted by Chris Burroughs <ch...@gmail.com>.
+list

Makes sense.  My concern was less per topic and more other things on the
same box (I probably want kafka to sync more often than my webserver,
but less often than a database).

On 2012-06-19 01:06, Jay Kreps wrote:
> Yes, that's right, it is a global setting so you lose the ability to have
> per-topic overrides. I think the idea, though, is with replication the real
> durability guarantee comes from the replication and the syncing is just to
> ensure data makes it to disk reasonably quickly.
> 
> -Jay
> 
> On Mon, Jun 18, 2012 at 6:21 PM, Chris Burroughs
> <ch...@gmail.com>wrote:
> 
>> Thanks Jay.  This is a very helpful investigation!
>>
>> On 05/24/2012 01:40 PM, Jay Kreps wrote:
>>>
>>> Unfortunately *any* call to fsync will block appends even in a background
>>> thread so how can we give control over physical disk persistence without
>>> introducing high latency for the producer? The answer is that the linux
>>> pdflush daemon actually does a very similar thing to our flush
>> parameters.
>>> pdflush is a daemon running on every linux machine that controls the
>>> writing of buffered/cached data back to disk. It allows you to control
>> the
>>> percentage of memory filled with dirty pages by giving it either a
>>> percentage of memory, a time out for any dirty page to be written, or a
>>> fixed number of dirty bytes.
>>
>>
>> This would however by necessity by a global setting right?  (Assuming
>> there is no /proc trickery to change per-pid pdflush behaviour)
>>
>