You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by ju...@apache.org on 2014/01/26 18:35:34 UTC

svn commit: r1561522 - /kafka/site/08/design.html

Author: junrao
Date: Sun Jan 26 17:35:33 2014
New Revision: 1561522

URL: http://svn.apache.org/r1561522
Log:
minor change to unclean leader election

Modified:
    kafka/site/08/design.html

Modified: kafka/site/08/design.html
URL: http://svn.apache.org/viewvc/kafka/site/08/design.html?rev=1561522&r1=1561521&r2=1561522&view=diff
==============================================================================
--- kafka/site/08/design.html (original)
+++ kafka/site/08/design.html Sun Jan 26 17:35:33 2014
@@ -213,15 +213,15 @@ Another important design distinction is 
 
 <h4>Unclean leader election: What if they all die?</h4>
 
-Note that Kafka's guarantee with respect to data loss is predicated on at least on replica remaining in sync. If all the nodes replicating a partition die, this guarantee no longer holds.
+Note that Kafka's guarantee with respect to data loss is predicated on at least on replica remaining in sync. If the current leader dies and no remaining live replicas are in the ISR, this guarantee no longer holds. If your have more than one replica assigned to a partiiton, this should be relatively rare since at least two brokers have to fail for this to happen.
 <p>
-However a practical system needs to do something reasonable when all the replicas die. If you are unlucky enough to have this occur, it is important to consider what will happen. There are two behaviors that could be implemented:
+However a practical system needs to do something reasonable when all in-sync replicas die. If you are unlucky enough to have this occur, it is important to consider what will happen. There are two behaviors that could be implemented:
 <ol>
 	<li>Wait for a replica in the ISR to come back to life and choose this replica as the leader (hopefully it still has all its data).
 	<li>Choose the first replica (not necessarily in the ISR) that comes back to life as the leader.
 </ol>
 <p>
-This is a simple tradeoff between availability and consistency. If we wait for replicas in the ISR, then we will remain unavailable as long as those replicas are down. If such replicas were destroyed or their data was lost, then we are permanently down. If, on the other hand, a non-in-sync replica comes back to life and we allow it to become leader, then its log becomes the source of truth even though it is not guaranteed to have every committed message. In our current release we choose the second strategy and favor choosing a potentially inconsistent replica when all replicas in the ISR are dead. In the future, we would like to make this configurable to better support use cases where downtime is preferable to inconsistency.
+This is a simple tradeoff between availability and consistency. If we wait for replicas in the ISR, then we will remain unavailable as long as those replicas are down. If such replicas were destroyed or their data was lost, then we are permanently down. If, on the other hand, a non-in-sync replica comes back to life and we allow it to become the leader, then its log becomes the source of truth even though it is not guaranteed to have every committed message. In our current release we choose the second strategy and favor choosing a potentially inconsistent replica when all replicas in the ISR are dead. In the future, we would like to make this configurable to better support use cases where downtime is preferable to inconsistency.
 <p>
 This dilemma is not specific to Kafka. It exists in any quorum-based scheme. For example in a majority voting scheme, if a majority of servers suffer a permanent failure, then you must either choose to lose 100% of your data or violate consistency by taking what remains on an existing server as your new source of truth.