You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@kafka.apache.org by jg...@apache.org on 2016/10/04 02:37:00 UTC
kafka git commit: MINOR: Clarify doc on consumption of topics

Repository: kafka
Updated Branches:
  refs/heads/trunk cf0bf7c7a -> 91d025e06


MINOR: Clarify doc on consumption of topics

In doc it stays:

_"Our topic is divided into a set of totally ordered partitions, each of which is consumed by one consumer at any given time."_

And consumer is described as:

_"We'll call **processes** that subscribe to topics and process the feed of published messages **consumers**."_

Which might lead to a wrong conclusion - that each partition can be read by one process at any given time.

I think this statements misses information about **consumer groups**, so i propose:

_"Our topic is divided into a set of totally ordered partitions, each of which is consumed by exactly one consumer (from each subscribed consumer groups) at any given time"_

This contribution is my original work and I license the work to the project under the project's open source license.

Author: pilo <ja...@4finance.com>

Reviewers: Jiangjie Qin <be...@gmail.com>, Jason Gustafson <ja...@confluent.io>

Closes #1900 from pilloPl/minor/doc-fix


Project: http://git-wip-us.apache.org/repos/asf/kafka/repo
Commit: http://git-wip-us.apache.org/repos/asf/kafka/commit/91d025e0
Tree: http://git-wip-us.apache.org/repos/asf/kafka/tree/91d025e0
Diff: http://git-wip-us.apache.org/repos/asf/kafka/diff/91d025e0

Branch: refs/heads/trunk
Commit: 91d025e063e9f2b8e5799f84f6f3f7f1e9b0916c
Parents: cf0bf7c
Author: pilo <ja...@4finance.com>
Authored: Mon Oct 3 19:29:53 2016 -0700
Committer: Jason Gustafson <ja...@confluent.io>
Committed: Mon Oct 3 19:35:54 2016 -0700

----------------------------------------------------------------------
 docs/design.html | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kafka/blob/91d025e0/docs/design.html
----------------------------------------------------------------------
diff --git a/docs/design.html b/docs/design.html
index 67bca47..cd4a969 100644
--- a/docs/design.html
+++ b/docs/design.html
@@ -137,7 +137,7 @@ Most messaging systems keep metadata about what messages have been consumed on t
 <p>
 What is perhaps not obvious is that getting the broker and consumer to come into agreement about what has been consumed is not a trivial problem. If the broker records a message as <b>consumed</b> immediately every time it is handed out over the network, then if the consumer fails to process the message (say because it crashes or the request times out or whatever) that message will be lost. To solve this problem, many messaging systems add an acknowledgement feature which means that messages are only marked as <b>sent</b> not <b>consumed</b> when they are sent; the broker waits for a specific acknowledgement from the consumer to record the message as <b>consumed</b>. This strategy fixes the problem of losing messages, but creates new problems. First of all, if the consumer processes the message but fails before it can send an acknowledgement then the message will be consumed twice. The second problem is around performance, now the broker must keep multiple states about every single 
 message (first to lock it so it is not given out a second time, and then to mark it as permanently consumed so that it can be removed). Tricky problems must be dealt with, like what to do with messages that are sent but never acknowledged.
 <p>
-Kafka handles this differently. Our topic is divided into a set of totally ordered partitions, each of which is consumed by one consumer at any given time. This means that the position of a consumer in each partition is just a single integer, the offset of the next message to consume. This makes the state about what has been consumed very small, just one number for each partition. This state can be periodically checkpointed. This makes the equivalent of message acknowledgements very cheap.
+Kafka handles this differently. Our topic is divided into a set of totally ordered partitions, each of which is consumed by exactly one consumer within each subscribing consumer group at any given time. This means that the position of a consumer in each partition is just a single integer, the offset of the next message to consume. This makes the state about what has been consumed very small, just one number for each partition. This state can be periodically checkpointed. This makes the equivalent of message acknowledgements very cheap.
 <p>
 There is a side benefit of this decision. A consumer can deliberately <i>rewind</i> back to an old offset and re-consume data. This violates the common contract of a queue, but turns out to be an essential feature for many consumers. For example, if the consumer code has a bug and is discovered after some messages are consumed, the consumer can re-consume those messages once the bug is fixed.