You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@kafka.apache.org by "Ewen Cheslack-Postava (JIRA)" <ji...@apache.org> on 2017/01/24 22:52:26 UTC

[jira] [Commented] (KAFKA-4666) Failure test for Kafka configured for consistency vs availability

    [ https://issues.apache.org/jira/browse/KAFKA-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15836790#comment-15836790 ] 

Ewen Cheslack-Postava commented on KAFKA-4666:
----------------------------------------------

[~ecesena] ducktape test to validate this is a nice way to validate this :) By "losing" data, do you mean that the acked data never becomes visible to consumers if the first broker never comes back? If so, this is expected. Even if you specify a smaller # of acks, data will not be visible to consumers until it's been acked by the ISR (and there are enough to satisfy min.isr).

I don't think there's anything unexpected in your test, but I agree it could be made clearer in that section that acks=all is important if you want the producer to only get acked when the data has been replicated sufficiently to protect against loss.

 (Of course, if you have unclean leader election enabled, then there are other scenarios you can lose data.)

> Failure test for Kafka configured for consistency vs availability
> -----------------------------------------------------------------
>
>                 Key: KAFKA-4666
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4666
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Emanuele Cesena
>         Attachments: consistency_test.py
>
>
> We recently had an issue with our Kafka setup because of a misconfiguration.
> In short, we thought we have configured Kafka for durability, but we didn't set the producers to acks=all. During a full outage, we had situations where some partitions were "partitioned", meaning that the followers started without properly waiting for the right leader, and thus we lost data. Again, this is not an issue with Kafka, but a misconfiguration on our side.
> I think we reproduced the issue, and we built a docker test that proves that, if the producer isn't set with acks=all, then data can be lost during an almost full outage. The test is attached.
> I was thinking to send a PR, but wanted to run this through you first, as it's not necessarily proving that a feature works as expected.
> In addition, I think the documentation could be slightly improved, for instance in the section:
> http://kafka.apache.org/documentation/#design_ha
> by clearly stating that there are 3 steps one should do for configuring kafka for consistency, the third being that producers should be set with acks=all (which is now part of the 2nd point).
> Please let me know what do you think, and I can send a PR if you agree.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)