You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by ju...@apache.org on 2014/07/02 17:59:50 UTC

svn commit: r1607405 - /kafka/site/081/design.html

Author: junrao
Date: Wed Jul  2 15:59:50 2014
New Revision: 1607405

URL: http://svn.apache.org/r1607405
Log:
fix broken link and typo

Modified:
    kafka/site/081/design.html

Modified: kafka/site/081/design.html
URL: http://svn.apache.org/viewvc/kafka/site/081/design.html?rev=1607405&r1=1607404&r2=1607405&view=diff
==============================================================================
--- kafka/site/081/design.html (original)
+++ kafka/site/081/design.html Wed Jul  2 15:59:50 2014
@@ -97,7 +97,7 @@ The client controls which partition it p
 <p>
 Batching is one of the big drivers of efficiency, and to enable batching the Kafka producer has an asynchronous mode that accumulates data in memory and sends out larger batches in a single request. The batching can be configured to accumulate no more than a fixed number of messages and to wait no longer than some fixed latency bound (say 100 messages or 5 seconds). This allows the accumulation of more bytes to send, and few larger I/O operations on the servers. Since this buffering happens in the client it obviously reduces the durability as any data buffered in memory and not yet sent will be lost in the event of a producer crash.
 <p>
-Note that as of Kafka 0.8.1 the async producer does not have a callback, which could be used to register handlers to catch send errors.  Adding such callback functionality is proposed for Kafka 0.9, see [Proposed Producer API](https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ProposedProducerAPI).
+Note that as of Kafka 0.8.1 the async producer does not have a callback, which could be used to register handlers to catch send errors.  Adding such callback functionality is proposed for Kafka 0.9, see <a href="https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite#ClientRewrite-ProposedProducerAPI">Proposed Producer API</a>.
 
 <h3><a id="theconsumer">4.5 The Consumer</a></h3>
 
@@ -155,7 +155,7 @@ These are not the strongest possible sem
 <p>
 Not all use cases require such strong guarantees. For uses which are latency sensitive we allow the producer to specify the durability level it desires. If the producer specifies that it wants to wait on the message being committed this can take on the order of 10 ms. However the producer can also specify that it wants to perform the send completely asynchronously or that it wants to wait only until the leader (but not necessarily the followers) have the message.
 <p>
-Now let's describe the semantics from the point-of-view of the consumer. All replicas have the exact same log with the same offsets. The consumer controls its position in this log. If the consumer never crashed it could just store this position in memory, but if the producer fails and we want this topic partition to be taken over by another process the new process will need to choose an appropriate position from which to start processing. Let's say the consumer reads some messages -- it has several options for processing the messages and updating its position.
+Now let's describe the semantics from the point-of-view of the consumer. All replicas have the exact same log with the same offsets. The consumer controls its position in this log. If the consumer never crashed it could just store this position in memory, but if the consumer fails and we want this topic partition to be taken over by another process the new process will need to choose an appropriate position from which to start processing. Let's say the consumer reads some messages -- it has several options for processing the messages and updating its position.
 <ol>
   <li>It can read the messages, then save its position in the log, and finally process the messages. In this case there is a possibility that the consumer process crashes after saving its position but before saving the output of its message processing. In this case the process that took over processing would start at the saved position even though a few messages prior to that position had not been processed. This corresponds to "at-most-once" semantics as in the case of a consumer failure messages may not be processed.
   <li>It can read the messages, process the messages, and finally save its position. In this case there is a possibility that the consumer process crashes after processing messages but before saving its position. In this case when the new process takes over the first few messages it receives will already have been processed. This corresponds to the "at-least-once" semantics in the case of consumer failure. In many cases messages have a primary key and so the updates are idempotent (receiving the same message twice just overwrites a record with another copy of itself).