You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jay Kreps (JIRA)" <ji...@apache.org> on 2011/07/30 19:42:09 UTC

[jira] [Created] (KAFKA-77) Implement "group commit" for kafka logs

Implement "group commit" for kafka logs
---------------------------------------

                 Key: KAFKA-77
                 URL: https://issues.apache.org/jira/browse/KAFKA-77
             Project: Kafka
          Issue Type: Improvement
    Affects Versions: 0.7
            Reporter: Jay Kreps
            Assignee: Jay Kreps
             Fix For: 0.8


The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.

One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481140#comment-13481140 ] 

Jun Rao commented on KAFKA-77:
------------------------------

I think what Jay meant is that in 0.8, a message is considered as committed as long as it's written in memory in f brokers (f being the replication factor). This is probably as good or better than forcing data to disk, assuming failures are rare. Therefore, flushing to disk does not need to be optimized for durability guarantees.
                
> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Dave Revell (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481086#comment-13481086 ] 

Dave Revell commented on KAFKA-77:
----------------------------------

> This is not really a good idea post 0.8 as we no longer have much dependence on the disk flush.

Jay, would you mind explaining a bit more? Is there a new feature in Kafka >0.8 that improves durability without the the needs for disk flushes? Or is there perhaps a new feature that decreases the performance penalty of flushing after every message?


                
> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073585#comment-13073585 ] 

Jun Rao commented on KAFKA-77:
------------------------------

Chris, I am not if if we gain much by limiting the # of fsync. In a typically scenario in Kafka, most reads are served from pagecache. So the real I/O load to the underlying storage system is fsync. If at a given point of time, there is a pending write and there is no ongoing fsync, we are not fully utilizing the available resource of the storage system. I think a more effective way is to keep flushing in a separate thread. If there are multiple additional writes accumulated during one flush, the next flush will fsync more data to the storage media in a single call, essentially getting the benefit of group commit. If there is only 1 more write accumulated, syncing it immediately doesn't hurt since otherwise the storage system will likely be idle.


> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps updated KAFKA-77:
---------------------------

    Attachment: kafka-group-commit.patch

A patch that implements group commit for kafka. This implementation is a little complex, the append() method is now a little scary, maybe someone sees a way to do it more simply.

A couple of notes:

1. I don't use any separate threads, the actual write is done by one of the writing threads involved in the commit (essentially it is a race, whomever gets there first does it).

2. I only try to batch the flush, I don't try to batch the write() call though. Batching writes could be done as well but it would require either working around the MessageSet.writeTo interface since you now want to write multiple message sets at once in a single call, which breaks the current abstraction. Also the write call gives time for more writes to accumulate in the group so that might not help.

3. I try to limit the group size to some fixed upper limit (50) which I just hard code. In practice I could not produce groups of more than 3, but I want to guarantee that you can't block the commit forever by queuing up writes under high load.

This whole idea is really only worth it if there are non-pathological cases where performance gets significantly better and performance doesn't get worse anywhere else.

I haven't really done any performance testing yet as my laptop seems to get CPU bound by the producer perf test process which means I am having trouble producing an I/O-bound load on one machine. I think I need to run an experiment with more than one producer machine which is separate from the kafka machine and perhaps with more than one topic to force seeks when we do flushes (sequentially flushes should be much cheaper but that would only happen if you had one topic). I will update this bug when i have some real benchmarking.


> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jun Rao (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073429#comment-13073429 ] 

Jun Rao commented on KAFKA-77:
------------------------------

Thanks for the patch. The logic in the patch looks correct. I don't know how much benefit we can gain from this. Will wait for the performance number.

My main concern with this patch is that it doesn't remove either of the two existing log flush configs and potentially adds a third one, MaxGroupCommitSize. This seems to complicate the configs further.

My preference would be to implement a separate flush thread that constantly obtains dirty file segments from a blocking queue and flushes each of them as fast as possible. We can replace the two existing flush configs with a new one that controls the queue size (i.e., # of outstanding flushes).

> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Neha Narkhede (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13134637#comment-13134637 ] 

Neha Narkhede commented on KAFKA-77:
------------------------------------

I tried running the new producer perf tests on this patch, measuring the producer throughput with and without this patch. Here are the findings -

group.commit, MB/sec, messages/sec, broker.num.partitions, broker.flush.interval, broker.num.threads, num.producer.threads
yes, 1.8903, 9910.4099, 1, 1, 8, 8
no, 1.1037, 5786.3673, 1, 1, 8, 8
yes, 6.0624, 31784.5769, 1, 1000, 8, 8
no, 4.9943, 26184.3166, 1, 1000, 8, 8

varying the number of threads on the server (affects the number of writes that can be batched)

group.commit, MB/sec, messages/sec, broker.num.partitions, broker.flush.interval, broker.num.threads, num.producer.threads

yes, 2.0313, 10649.6273, 1, 1, 32, 8
no, 1.1499, 6028.5997, 1, 1, 32, 8
yes, 6.2151, 32584.9653, 1, 1000, 32, 8
no, 4.8507, 25431.4445, 1, 1000, 32, 8


To summarize, it shows at least 16% improvement (with flush interval 1000 and 8 server threads) and 38% improvement (with flush interval 1 and 8 server threads)


                
> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jay Kreps resolved KAFKA-77.
----------------------------

    Resolution: Won't Fix

This is not really a good idea post 0.8 as we no longer have much dependence on the disk flush.
                
> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Chris Burroughs (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073519#comment-13073519 ] 

Chris Burroughs commented on KAFKA-77:
--------------------------------------

bq. My preference would be to implement a separate flush thread that constantly obtains dirty file segments from a blocking queue and flushes each of them as fast as possible.

flush == FileChannel.force == fsync, right?  Isn't the the point to limit fsync to a reasonable (not too many per scond) rate, not to issue them as fast as possible? 

> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (KAFKA-77) Implement "group commit" for kafka logs

Posted by "Jay Kreps (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/KAFKA-77?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13073592#comment-13073592 ] 

Jay Kreps commented on KAFKA-77:
--------------------------------

Yeah, fwiw, I consider this a proof of concept to understand the perf impact.

Chris, yes, flush == FileChannel.force == fsync.

To clarify the performance case I am going after is not the case where you have tuned back sync() to a reasonable level, this case we already handle optimally i think. The case I was targeting was the case where you need to sync on every write for durability (or to reduce consumer latency). In this case I suspect we could batch 30-50% of the flushes, which could help a lot. Since this is a performance optimization I agree it is only maybe worth it if the performance is quite good in the target case and not worse elsewhere. If not then we will at least have explored the possibility.

One thing that would greatly help this kind of thing would be to have a sort of canonical performance suite to run against, but making that is more work then this patch...

I agree that it could be cleaner to have a separate I/O thread pool that handled all writes for each Log in a single threaded manner off its own write queue. If the performance pans out i will consider this.

> Implement "group commit" for kafka logs
> ---------------------------------------
>
>                 Key: KAFKA-77
>                 URL: https://issues.apache.org/jira/browse/KAFKA-77
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.7
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>         Attachments: kafka-group-commit.patch
>
>
> The most expensive operation for the server is usually going to be the fsync() call to sync data in a log to disk, if you don't flush your data is at greater risk of being lost in a crash. Currently we give two knobs to tune this trade--log.flush.interval and log.default.flush.interval.ms (no idea why one has default and the other doesn't since they are both defaults). However if you flush frequently, say on every write, then performance is not that great.
> One trick that can be used to improve this worst case of continual flushes is to allow a single fsync() to be used for multiple writes that occur at the same time. This is a lot like "group commit" in databases. It is unclear which cases this would improve and by how much but it might be worth a try.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira