You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Jun Rao (JIRA)" <ji...@apache.org> on 2012/06/29 18:34:43 UTC
[jira] [Commented] (KAFKA-382) Write ordering guarantee violated

    [ https://issues.apache.org/jira/browse/KAFKA-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404010#comment-13404010 ] 

Jun Rao commented on KAFKA-382:
-------------------------------

Currently, each produce request blocks on a response, which means that a client can't send the next request until it has received a response from the current request. This will guarantee that all writes are processed in send order at the leader. So, we are fine now. If we want to make the producer non-blocking, then the shared work queue approach can violate the ordering. But if the producer is non-blocking, maybe you can argue that it doesn't care about ordering.
                
> Write ordering guarantee violated
> ---------------------------------
>
>                 Key: KAFKA-382
>                 URL: https://issues.apache.org/jira/browse/KAFKA-382
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>             Fix For: 0.8
>
>
> The guarantee is that if the producer does
>   send(X)
>   send(Y)
> the client see X first and Y second, but this may not actually happen in 0.8. The reason is because of the parallel I/O threads and the single queue in the network server. The current model is one work queue and one response queue per selector. The single queue is great from a parallelism point of view--if one thread is blocked another can do the work--but this actually breaks the ordering guarantee. Not sure how I missed this in the initial work. :-(
> The reason for the single work queue was to avoid blocking a whole selector when one thread does a flush. But I wonder now how relevant that is now. If the durability guarantee comes from replication I think there is not much reason to have a blocking flush, we can rely on pdflush to do it in the background so doing the write synchronously may be fine.
> I think the solution is to modify RequestChannel to have one work queue per I/O thread and hash into the work queue by connection id. In this solution a blocked I/O thread only blocks clients that hash onto it. This retains the current async model but no longer has the property that a blocked thread doesn't block everyone. (At first I thought we didn't need a RequestChannel at all any more and could just synchronously return zero or more requests from KafkaApis, but in reality because of the possibility of request timeout from a background thread, this won't work.)
> It would also be possible to be smarter still and attempt a non-blocking solution that only preserves the write-ordering guarantees. One solution would be as follows. Each request from a given connection would be assigned an increasing number starting with 0 by the network layer. KafkaApi would keep a "last processed" number for each connection. Any request which is more than the current number for that connection + 1 would be re-enqueued. I don't like this solution because it is more complex and because I don't think blocking flushes are needed now that we have replication (e.g. you can just turn on replication and rely on pdflush which is async), so optimizing this case is not useful imo.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira