You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Edward Smith <es...@stardotstar.org> on 2012/04/27 00:13:08 UTC

Docs (again!)

I swear I'm not nitpicking!  I'm working on ensuring I have my project
conceptually 'sane' before I get started, and I keep referring back to
the Kafka Design Docs to double check things.    I did notice that my
suggested changes last time made it in, thanks to Jun or whoever put
in the change.  I think it is much clearer now.

We have these to paragraphs in conflict (I think):

---first paragraph---
Currently, there is no built-in load balancing between the producers
and the brokers in Kafka; in our own usage we publish from a large
number of heterogeneous machines and so it is desirable that the
publisher not need any explicit knowledge of the cluster topology. We
rely on a hardware load balancer to distribute the producer load
across multiple brokers. We will consider adding this in a future
release to allow semantic partitioning of messages (i.e. publishing
all messages to a particular broker based on some id to ensure an
ordered stream of updates within that id).

---second paragragh---
Automatic producer load balancing

Kafka supports client-side load balancing for message producers or use
of a dedicated load balancer to balance TCP connections. A dedicated
layer-4 load balancer works by balancing TCP connections over Kafka
brokers. In this configuration all messages from a given producer go
to a single broker. The advantage of using a level-4 load balancer is
that each producer only needs a single TCP connection, and no
connection to zookeeper is needed. The disadvantage is that the
balancing is done at the TCP connection level, and hence it may not be
well balanced (if some producers produce many more messages then
others, evenly dividing up the connections per broker may not result
in evenly dividing up the messages per broker).

Client-side zookeeper-based load balancing solves some of these
problems. It allows the producer to dynamically discover new brokers,
and balance load on a per-request basis. Likewise it allows the
producer to partition data according to some key instead of randomly,
which enables stickiness on the consumer (e.g. partitioning data
consumption by user id). This feature is called "semantic
partitioning", and is described in more detail below.

The working of the zookeeper-based load balancing is described below.
Zookeeper watchers are registered on the following events—
<snip>

Re: Docs (again!)

Posted by Jun Rao <ju...@gmail.com>.
Edward,

Thanks for the comments. I made some changes to clarify the producer side
logic. The changes should show up in the next few hours on the website. Let
us know if there is anything else not clear.

Jun

On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <es...@stardotstar.org>wrote:

> I swear I'm not nitpicking!  I'm working on ensuring I have my project
> conceptually 'sane' before I get started, and I keep referring back to
> the Kafka Design Docs to double check things.    I did notice that my
> suggested changes last time made it in, thanks to Jun or whoever put
> in the change.  I think it is much clearer now.
>
> We have these to paragraphs in conflict (I think):
>
> ---first paragraph---
> Currently, there is no built-in load balancing between the producers
> and the brokers in Kafka; in our own usage we publish from a large
> number of heterogeneous machines and so it is desirable that the
> publisher not need any explicit knowledge of the cluster topology. We
> rely on a hardware load balancer to distribute the producer load
> across multiple brokers. We will consider adding this in a future
> release to allow semantic partitioning of messages (i.e. publishing
> all messages to a particular broker based on some id to ensure an
> ordered stream of updates within that id).
>
> ---second paragragh---
> Automatic producer load balancing
>
> Kafka supports client-side load balancing for message producers or use
> of a dedicated load balancer to balance TCP connections. A dedicated
> layer-4 load balancer works by balancing TCP connections over Kafka
> brokers. In this configuration all messages from a given producer go
> to a single broker. The advantage of using a level-4 load balancer is
> that each producer only needs a single TCP connection, and no
> connection to zookeeper is needed. The disadvantage is that the
> balancing is done at the TCP connection level, and hence it may not be
> well balanced (if some producers produce many more messages then
> others, evenly dividing up the connections per broker may not result
> in evenly dividing up the messages per broker).
>
> Client-side zookeeper-based load balancing solves some of these
> problems. It allows the producer to dynamically discover new brokers,
> and balance load on a per-request basis. Likewise it allows the
> producer to partition data according to some key instead of randomly,
> which enables stickiness on the consumer (e.g. partitioning data
> consumption by user id). This feature is called "semantic
> partitioning", and is described in more detail below.
>
> The working of the zookeeper-based load balancing is described below.
> Zookeeper watchers are registered on the following events—
> <snip>
>

Re: Zookeeper Usage and Operations

Posted by Christian Carollo <cc...@gmail.com>.
Is there any update as to when the below documentation might be completed?

Thanks
Christian

On Apr 27, 2012, at 1:55 PM, Neha Narkhede <ne...@gmail.com> wrote:

>>> Any chance we can get those operational details added?
> 
> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
> 
> Certainly. I said I will do this, but then forgot about it. I will surely
> get this in by next week.
> 
> Regarding Kafka's zookeeper dependency, I guess its not very clear from our
> design doc on what Kafka features you lose by not running zookeeper.
> 
> If you choose to not run zookeeper, you will not be able to use the
> auto-load balancing on the consumer side. Also, the producer side
> zookeeper-based load balancing feature will be unusable.
> 
> Let me see if I can update our design doc to make this clearer.
> 
> Thanks,
> Neha
> 
> On Fri, Apr 27, 2012 at 1:47 PM, Christian Carollo <cc...@gmail.com>wrote:
> 
>> Thanks, Ed. I had seen that link. I am really looking for a clear outline/
>> set of rules as to when and why to use zookeeper and when and why not to.
>> 
>> Then if using it makes sense, it seems like there are some value known
>> details about how to keep zookeeper happy that would be great to have
>> documented.
>> 
>> Christian
>> 
>> 
>> On Apr 27, 2012, at 1:41 PM, Edward Smith <es...@stardotstar.org> wrote:
>> 
>>> Christian,
>>> 
>>> I'm new to Kafka, too.  The page linked below describes how ZK is
>>> typically used with Kafka, although it is my impression that you don't
>>> have to use ZK if you don't want to.  Not using it is also described
>>> briefly in the design.
>>> 
>>> http://incubator.apache.org/kafka/design.html
>>> 
>>> Ed
>>> 
>>> On Fri, Apr 27, 2012 at 4:38 PM, Christian Carollo <cc...@gmail.com>
>> wrote:
>>>> Is it possible to get a broad overview of what Zookeeper provides
>> kafka?  Why it is used with Kafka?
>>>> 
>>>> Also in the Wiki > Operations, there is a Zookeeper section but it
>> really just says that it needs to be filled in.
>>>> Any chance we can get those operational details added?
>>>> 
>>>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
>>>> 
>>>> Thanks,
>>>> Christian
>> 


Re: Zookeeper Usage and Operations

Posted by Edward Smith <es...@stardotstar.org>.
Neha,

  From my understanding, you would also lose the offset tracking in
the standard consumers.

Ed

On Fri, Apr 27, 2012 at 4:55 PM, Neha Narkhede <ne...@gmail.com> wrote:
>>> Any chance we can get those operational details added?
>
> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
>
> Certainly. I said I will do this, but then forgot about it. I will surely
> get this in by next week.
>
> Regarding Kafka's zookeeper dependency, I guess its not very clear from our
> design doc on what Kafka features you lose by not running zookeeper.
>
> If you choose to not run zookeeper, you will not be able to use the
> auto-load balancing on the consumer side. Also, the producer side
> zookeeper-based load balancing feature will be unusable.
>
> Let me see if I can update our design doc to make this clearer.
>
> Thanks,
> Neha
>
> On Fri, Apr 27, 2012 at 1:47 PM, Christian Carollo <cc...@gmail.com>wrote:
>
>> Thanks, Ed. I had seen that link. I am really looking for a clear outline/
>> set of rules as to when and why to use zookeeper and when and why not to.
>>
>> Then if using it makes sense, it seems like there are some value known
>> details about how to keep zookeeper happy that would be great to have
>> documented.
>>
>> Christian
>>
>>
>> On Apr 27, 2012, at 1:41 PM, Edward Smith <es...@stardotstar.org> wrote:
>>
>> > Christian,
>> >
>> >  I'm new to Kafka, too.  The page linked below describes how ZK is
>> > typically used with Kafka, although it is my impression that you don't
>> > have to use ZK if you don't want to.  Not using it is also described
>> > briefly in the design.
>> >
>> > http://incubator.apache.org/kafka/design.html
>> >
>> > Ed
>> >
>> > On Fri, Apr 27, 2012 at 4:38 PM, Christian Carollo <cc...@gmail.com>
>> wrote:
>> >> Is it possible to get a broad overview of what Zookeeper provides
>> kafka?  Why it is used with Kafka?
>> >>
>> >> Also in the Wiki > Operations, there is a Zookeeper section but it
>> really just says that it needs to be filled in.
>> >> Any chance we can get those operational details added?
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
>> >>
>> >> Thanks,
>> >> Christian
>>

Re: Zookeeper Usage and Operations

Posted by Neha Narkhede <ne...@gmail.com>.
>> Any chance we can get those operational details added?

https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper

Certainly. I said I will do this, but then forgot about it. I will surely
get this in by next week.

Regarding Kafka's zookeeper dependency, I guess its not very clear from our
design doc on what Kafka features you lose by not running zookeeper.

If you choose to not run zookeeper, you will not be able to use the
auto-load balancing on the consumer side. Also, the producer side
zookeeper-based load balancing feature will be unusable.

Let me see if I can update our design doc to make this clearer.

Thanks,
Neha

On Fri, Apr 27, 2012 at 1:47 PM, Christian Carollo <cc...@gmail.com>wrote:

> Thanks, Ed. I had seen that link. I am really looking for a clear outline/
> set of rules as to when and why to use zookeeper and when and why not to.
>
> Then if using it makes sense, it seems like there are some value known
> details about how to keep zookeeper happy that would be great to have
> documented.
>
> Christian
>
>
> On Apr 27, 2012, at 1:41 PM, Edward Smith <es...@stardotstar.org> wrote:
>
> > Christian,
> >
> >  I'm new to Kafka, too.  The page linked below describes how ZK is
> > typically used with Kafka, although it is my impression that you don't
> > have to use ZK if you don't want to.  Not using it is also described
> > briefly in the design.
> >
> > http://incubator.apache.org/kafka/design.html
> >
> > Ed
> >
> > On Fri, Apr 27, 2012 at 4:38 PM, Christian Carollo <cc...@gmail.com>
> wrote:
> >> Is it possible to get a broad overview of what Zookeeper provides
> kafka?  Why it is used with Kafka?
> >>
> >> Also in the Wiki > Operations, there is a Zookeeper section but it
> really just says that it needs to be filled in.
> >> Any chance we can get those operational details added?
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
> >>
> >> Thanks,
> >> Christian
>

Re: Zookeeper Usage and Operations

Posted by Christian Carollo <cc...@gmail.com>.
Thanks, Ed. I had seen that link. I am really looking for a clear outline/ set of rules as to when and why to use zookeeper and when and why not to. 

Then if using it makes sense, it seems like there are some value known details about how to keep zookeeper happy that would be great to have documented. 

Christian


On Apr 27, 2012, at 1:41 PM, Edward Smith <es...@stardotstar.org> wrote:

> Christian,
> 
>  I'm new to Kafka, too.  The page linked below describes how ZK is
> typically used with Kafka, although it is my impression that you don't
> have to use ZK if you don't want to.  Not using it is also described
> briefly in the design.
> 
> http://incubator.apache.org/kafka/design.html
> 
> Ed
> 
> On Fri, Apr 27, 2012 at 4:38 PM, Christian Carollo <cc...@gmail.com> wrote:
>> Is it possible to get a broad overview of what Zookeeper provides kafka?  Why it is used with Kafka?
>> 
>> Also in the Wiki > Operations, there is a Zookeeper section but it really just says that it needs to be filled in.
>> Any chance we can get those operational details added?
>> 
>> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
>> 
>> Thanks,
>> Christian

Re: Zookeeper Usage and Operations

Posted by Edward Smith <es...@stardotstar.org>.
Christian,

  I'm new to Kafka, too.  The page linked below describes how ZK is
typically used with Kafka, although it is my impression that you don't
have to use ZK if you don't want to.  Not using it is also described
briefly in the design.

http://incubator.apache.org/kafka/design.html

Ed

On Fri, Apr 27, 2012 at 4:38 PM, Christian Carollo <cc...@gmail.com> wrote:
> Is it possible to get a broad overview of what Zookeeper provides kafka?  Why it is used with Kafka?
>
> Also in the Wiki > Operations, there is a Zookeeper section but it really just says that it needs to be filled in.
> Any chance we can get those operational details added?
>
> https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper
>
> Thanks,
> Christian

Zookeeper Usage and Operations

Posted by Christian Carollo <cc...@gmail.com>.
Is it possible to get a broad overview of what Zookeeper provides kafka?  Why it is used with Kafka?

Also in the Wiki > Operations, there is a Zookeeper section but it really just says that it needs to be filled in.
Any chance we can get those operational details added?

https://cwiki.apache.org/confluence/display/KAFKA/Operations#Operations-Zookeeper

Thanks,
Christian

Re: Docs (again!)

Posted by Edward Smith <es...@stardotstar.org>.
Thanks for the encouragement, Jay.  I'm new to actually contributing
to OSS, so I'm still feeling out what the norm is.

Ed

On Fri, Apr 27, 2012 at 1:07 PM, Jay Kreps <ja...@gmail.com> wrote:
> Hey Edward,
>
> We actually greatly appreciate the feedback. Docs always make sense to
> the person who wrote them, who has been working closely on the thing
> for many months, but it is much harder to get them into shape for
> others so that they really give the information that is needed. So
> your feedback is not nitpicking it is actually very helpful.
>
> -Jay
>
> On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <es...@stardotstar.org> wrote:
>> I swear I'm not nitpicking!  I'm working on ensuring I have my project
>> conceptually 'sane' before I get started, and I keep referring back to
>> the Kafka Design Docs to double check things.    I did notice that my
>> suggested changes last time made it in, thanks to Jun or whoever put
>> in the change.  I think it is much clearer now.
>>
>> We have these to paragraphs in conflict (I think):
>>
>> ---first paragraph---
>> Currently, there is no built-in load balancing between the producers
>> and the brokers in Kafka; in our own usage we publish from a large
>> number of heterogeneous machines and so it is desirable that the
>> publisher not need any explicit knowledge of the cluster topology. We
>> rely on a hardware load balancer to distribute the producer load
>> across multiple brokers. We will consider adding this in a future
>> release to allow semantic partitioning of messages (i.e. publishing
>> all messages to a particular broker based on some id to ensure an
>> ordered stream of updates within that id).
>>
>> ---second paragragh---
>> Automatic producer load balancing
>>
>> Kafka supports client-side load balancing for message producers or use
>> of a dedicated load balancer to balance TCP connections. A dedicated
>> layer-4 load balancer works by balancing TCP connections over Kafka
>> brokers. In this configuration all messages from a given producer go
>> to a single broker. The advantage of using a level-4 load balancer is
>> that each producer only needs a single TCP connection, and no
>> connection to zookeeper is needed. The disadvantage is that the
>> balancing is done at the TCP connection level, and hence it may not be
>> well balanced (if some producers produce many more messages then
>> others, evenly dividing up the connections per broker may not result
>> in evenly dividing up the messages per broker).
>>
>> Client-side zookeeper-based load balancing solves some of these
>> problems. It allows the producer to dynamically discover new brokers,
>> and balance load on a per-request basis. Likewise it allows the
>> producer to partition data according to some key instead of randomly,
>> which enables stickiness on the consumer (e.g. partitioning data
>> consumption by user id). This feature is called "semantic
>> partitioning", and is described in more detail below.
>>
>> The working of the zookeeper-based load balancing is described below.
>> Zookeeper watchers are registered on the following events—
>> <snip>

Re: Docs (again!)

Posted by Jay Kreps <ja...@gmail.com>.
Hey Edward,

We actually greatly appreciate the feedback. Docs always make sense to
the person who wrote them, who has been working closely on the thing
for many months, but it is much harder to get them into shape for
others so that they really give the information that is needed. So
your feedback is not nitpicking it is actually very helpful.

-Jay

On Thu, Apr 26, 2012 at 3:13 PM, Edward Smith <es...@stardotstar.org> wrote:
> I swear I'm not nitpicking!  I'm working on ensuring I have my project
> conceptually 'sane' before I get started, and I keep referring back to
> the Kafka Design Docs to double check things.    I did notice that my
> suggested changes last time made it in, thanks to Jun or whoever put
> in the change.  I think it is much clearer now.
>
> We have these to paragraphs in conflict (I think):
>
> ---first paragraph---
> Currently, there is no built-in load balancing between the producers
> and the brokers in Kafka; in our own usage we publish from a large
> number of heterogeneous machines and so it is desirable that the
> publisher not need any explicit knowledge of the cluster topology. We
> rely on a hardware load balancer to distribute the producer load
> across multiple brokers. We will consider adding this in a future
> release to allow semantic partitioning of messages (i.e. publishing
> all messages to a particular broker based on some id to ensure an
> ordered stream of updates within that id).
>
> ---second paragragh---
> Automatic producer load balancing
>
> Kafka supports client-side load balancing for message producers or use
> of a dedicated load balancer to balance TCP connections. A dedicated
> layer-4 load balancer works by balancing TCP connections over Kafka
> brokers. In this configuration all messages from a given producer go
> to a single broker. The advantage of using a level-4 load balancer is
> that each producer only needs a single TCP connection, and no
> connection to zookeeper is needed. The disadvantage is that the
> balancing is done at the TCP connection level, and hence it may not be
> well balanced (if some producers produce many more messages then
> others, evenly dividing up the connections per broker may not result
> in evenly dividing up the messages per broker).
>
> Client-side zookeeper-based load balancing solves some of these
> problems. It allows the producer to dynamically discover new brokers,
> and balance load on a per-request basis. Likewise it allows the
> producer to partition data according to some key instead of randomly,
> which enables stickiness on the consumer (e.g. partitioning data
> consumption by user id). This feature is called "semantic
> partitioning", and is described in more detail below.
>
> The working of the zookeeper-based load balancing is described below.
> Zookeeper watchers are registered on the following events—
> <snip>