You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2012/08/14 18:36:37 UTC

[jira] [Resolved] (KAFKA-382) Write ordering guarantee violated

     [ https://issues.apache.org/jira/browse/KAFKA-382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jun Rao resolved KAFKA-382.
---------------------------

    Resolution: Not A Problem
    
> Write ordering guarantee violated
> ---------------------------------
>
>                 Key: KAFKA-382
>                 URL: https://issues.apache.org/jira/browse/KAFKA-382
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>
> The guarantee is that if the producer does
>   send(X)
>   send(Y)
> the client see X first and Y second, but this may not actually happen in 0.8. The reason is because of the parallel I/O threads and the single queue in the network server. The current model is one work queue and one response queue per selector. The single queue is great from a parallelism point of view--if one thread is blocked another can do the work--but this actually breaks the ordering guarantee. Not sure how I missed this in the initial work. :-(
> The reason for the single work queue was to avoid blocking a whole selector when one thread does a flush. But I wonder now how relevant that is now. If the durability guarantee comes from replication I think there is not much reason to have a blocking flush, we can rely on pdflush to do it in the background so doing the write synchronously may be fine.
> I think the solution is to modify RequestChannel to have one work queue per I/O thread and hash into the work queue by connection id. In this solution a blocked I/O thread only blocks clients that hash onto it. This retains the current async model but no longer has the property that a blocked thread doesn't block everyone. (At first I thought we didn't need a RequestChannel at all any more and could just synchronously return zero or more requests from KafkaApis, but in reality because of the possibility of request timeout from a background thread, this won't work.)
> It would also be possible to be smarter still and attempt a non-blocking solution that only preserves the write-ordering guarantees. One solution would be as follows. Each request from a given connection would be assigned an increasing number starting with 0 by the network layer. KafkaApi would keep a "last processed" number for each connection. Any request which is more than the current number for that connection + 1 would be re-enqueued. I don't like this solution because it is more complex and because I don't think blocking flushes are needed now that we have replication (e.g. you can just turn on replication and rely on pdflush which is async), so optimizing this case is not useful imo.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira