You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Timothy Chen <tn...@gmail.com> on 2013/05/22 21:25:35 UTC

Partitioning and scale

Hi,

I'm currently trying to understand how Kafka (0.8) can scale with our usage
pattern and how to setup the partitioning.

We want to route the same messages belonging to the same id to the same
queue, so its consumer will able to consume all the messages of that id.

My questions:

 - From my understanding, in Kafka we would need to have a custom
partitioner that routes the same messages to the same partition right?  I'm
trying to find examples of writing this partitioner logic, but I can't find
any. Can someone point me to an example?

- I see that Kafka server.properties allows one to specify the number of
partitions it supports. However, when we want to scale I wonder if we add #
of partitions or # of brokers, will the same partitioner start distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?

- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?

Thanks,

Tim

Re: Partitioning and scale

Posted by Neha Narkhede <ne...@gmail.com>.
Timothy,

Kafka is not designed to support millions of topics. Zookeeper will become
a bottleneck, even if you deploy more brokers to get around the # of files
issue. In normal cases, it might work just fine with the right sized
cluster. However, when there are failures, the time to recovery could be a
few minutes instead of a few 100s of ms or few seconds.

Also, even if you go with one topic per session id approach, it will be
unscalable when the key space increases in the future. A more scalable
approach is what Milind described. Use the sticky partitioning feature in
08 and have a session id always get routed to a particular partition. Then
each of your consumers can be sure that they will always receive data for a
subset of the session ids. So you can do any locality sensitive processing
on your consumers for processing user sessions.

Thanks,
Neha


On Thu, May 23, 2013 at 4:36 PM, Milind Parikh <mi...@gmail.com>wrote:

> Number of files to manage by os, I suppose.
>
> Why wouldn't you use consistent hashing with deliberately engineered
> collisions to generate a limited number of topics / partitions and filter
> at the consumer level?
>
> Regards
> Milind
> On May 23, 2013 4:22 PM, "Timothy Chen" <tn...@gmail.com> wrote:
>
> > Hi Neha,
> >
> > Not sure if this sounds crazy, but if we'd like to have the events for
> the
> > same session id go to the same partition one way could be that each
> session
> > key creates its own topic with single partition, therefore there could be
> > millions of topic with single partition.
> >
> > I wonder what would be the bottleneck of doing this?
> >
> > Thanks,
> >
> > Tim
> >
> >
> > On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > Not automatically as of today. You have to run the reassign-partitions
> > tool
> > > and explicitly move selected partitions to the new brokers. If you use
> > this
> > > tool, you can move partitions to the new broker without any downtime.
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <tn...@gmail.com>
> wrote:
> > >
> > > > Hi Neha/Chris,
> > > >
> > > > Thanks for the reply, so if I set a fixed number of partitions and
> just
> > > add
> > > > brokers to the broker pool, does it rebalance the load to the new
> > brokers
> > > > (along with the data)?
> > > >
> > > > Tim
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <
> > neha.narkhede@gmail.com
> > > > >wrote:
> > > >
> > > > > - I see that Kafka server.properties allows one to specify the
> number
> > > of
> > > > > partitions it supports. However, when we want to scale I wonder if
> we
> > > > add #
> > > > > of partitions or # of brokers, will the same partitioner start
> > > > distributing
> > > > > the messages to different partitions?
> > > > >  And if it does, how can that same consumer continue to read off
> the
> > > > > messages of those ids if it was interrupted in the middle?
> > > > >
> > > > > The num.partitions config in server.properties is used only for
> > topics
> > > > that
> > > > > are auto created (controlled by auto.create.topics.enable). For
> > topics
> > > > that
> > > > > you create using the admin tool, you can specify the number of
> > > partitions
> > > > > that you want. After that, currently there is no way to change
> that.
> > > For
> > > > > that reason, it is a good idea to over partition your topic, which
> > also
> > > > > helps load balance partitions onto the brokers. You are right that
> if
> > > you
> > > > > change the number of partitions later, then previously messages
> that
> > > > stuck
> > > > > to a certain partition would now get routed to a different
> partition,
> > > > which
> > > > > is undesirable for applications that want to use sticky
> partitioning.
> > > > >
> > > > > - I'd like to create a consumer per partition, and for each one to
> > > > > subscribe to the changes of that one. How can this be done in
> kafka?
> > > > >
> > > > > For your use case, it seems like SimpleConsumer might be a better
> > fit.
> > > > > However, it will require you to write code to handle discovery of
> > > leader
> > > > > for the partition that your consumer is consuming. Chris has
> written
> > > up a
> > > > > great example that you can follow -
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > > >
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > > >
> > > > > On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <
> > curtin.chris@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Hi Tim,
> > > > > >
> > > > > >
> > > > > > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tnachen@gmail.com
> >
> > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'm currently trying to understand how Kafka (0.8) can scale
> with
> > > our
> > > > > > usage
> > > > > > > pattern and how to setup the partitioning.
> > > > > > >
> > > > > > > We want to route the same messages belonging to the same id to
> > the
> > > > same
> > > > > > > queue, so its consumer will able to consume all the messages of
> > > that
> > > > > id.
> > > > > > >
> > > > > > > My questions:
> > > > > > >
> > > > > > >  - From my understanding, in Kafka we would need to have a
> custom
> > > > > > > partitioner that routes the same messages to the same partition
> > > > right?
> > > > > >  I'm
> > > > > > > trying to find examples of writing this partitioner logic, but
> I
> > > > can't
> > > > > > find
> > > > > > > any. Can someone point me to an example?
> > > > > > >
> > > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> > > > > >
> > > > > > The partitioner here does a simple mod on the IP address and the
> #
> > of
> > > > > > partitions. You'd need to define your own logic, but this is a
> > start.
> > > > > >
> > > > > >
> > > > > > > - I see that Kafka server.properties allows one to specify the
> > > number
> > > > > of
> > > > > > > partitions it supports. However, when we want to scale I wonder
> > if
> > > we
> > > > > > add #
> > > > > > > of partitions or # of brokers, will the same partitioner start
> > > > > > distributing
> > > > > > > the messages to different partitions?
> > > > > > >  And if it does, how can that same consumer continue to read
> off
> > > the
> > > > > > > messages of those ids if it was interrupted in the middle?
> > > > > > >
> > > > > >
> > > > > > I'll let someone else answer this.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > - I'd like to create a consumer per partition, and for each one
> > to
> > > > > > > subscribe to the changes of that one. How can this be done in
> > > kafka?
> > > > > > >
> > > > > >
> > > > > > Two ways: Simple Consumer or Consumer Groups:
> > > > > >
> > > > > > Depends on the level of control you want on code processing a
> > > specific
> > > > > > partition vs. getting one assigned to it (and level of control
> over
> > > > > offset
> > > > > > management).
> > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > > > > <
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > Tim
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Partitioning and scale

Posted by Milind Parikh <mi...@gmail.com>.
Number of files to manage by os, I suppose.

Why wouldn't you use consistent hashing with deliberately engineered
collisions to generate a limited number of topics / partitions and filter
at the consumer level?

Regards
Milind
On May 23, 2013 4:22 PM, "Timothy Chen" <tn...@gmail.com> wrote:

> Hi Neha,
>
> Not sure if this sounds crazy, but if we'd like to have the events for the
> same session id go to the same partition one way could be that each session
> key creates its own topic with single partition, therefore there could be
> millions of topic with single partition.
>
> I wonder what would be the bottleneck of doing this?
>
> Thanks,
>
> Tim
>
>
> On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Not automatically as of today. You have to run the reassign-partitions
> tool
> > and explicitly move selected partitions to the new brokers. If you use
> this
> > tool, you can move partitions to the new broker without any downtime.
> >
> > Thanks,
> > Neha
> >
> >
> > On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <tn...@gmail.com> wrote:
> >
> > > Hi Neha/Chris,
> > >
> > > Thanks for the reply, so if I set a fixed number of partitions and just
> > add
> > > brokers to the broker pool, does it rebalance the load to the new
> brokers
> > > (along with the data)?
> > >
> > > Tim
> > >
> > >
> > > On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > - I see that Kafka server.properties allows one to specify the number
> > of
> > > > partitions it supports. However, when we want to scale I wonder if we
> > > add #
> > > > of partitions or # of brokers, will the same partitioner start
> > > distributing
> > > > the messages to different partitions?
> > > >  And if it does, how can that same consumer continue to read off the
> > > > messages of those ids if it was interrupted in the middle?
> > > >
> > > > The num.partitions config in server.properties is used only for
> topics
> > > that
> > > > are auto created (controlled by auto.create.topics.enable). For
> topics
> > > that
> > > > you create using the admin tool, you can specify the number of
> > partitions
> > > > that you want. After that, currently there is no way to change that.
> > For
> > > > that reason, it is a good idea to over partition your topic, which
> also
> > > > helps load balance partitions onto the brokers. You are right that if
> > you
> > > > change the number of partitions later, then previously messages that
> > > stuck
> > > > to a certain partition would now get routed to a different partition,
> > > which
> > > > is undesirable for applications that want to use sticky partitioning.
> > > >
> > > > - I'd like to create a consumer per partition, and for each one to
> > > > subscribe to the changes of that one. How can this be done in kafka?
> > > >
> > > > For your use case, it seems like SimpleConsumer might be a better
> fit.
> > > > However, it will require you to write code to handle discovery of
> > leader
> > > > for the partition that your consumer is consuming. Chris has written
> > up a
> > > > great example that you can follow -
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <
> curtin.chris@gmail.com
> > > > >wrote:
> > > >
> > > > > Hi Tim,
> > > > >
> > > > >
> > > > > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I'm currently trying to understand how Kafka (0.8) can scale with
> > our
> > > > > usage
> > > > > > pattern and how to setup the partitioning.
> > > > > >
> > > > > > We want to route the same messages belonging to the same id to
> the
> > > same
> > > > > > queue, so its consumer will able to consume all the messages of
> > that
> > > > id.
> > > > > >
> > > > > > My questions:
> > > > > >
> > > > > >  - From my understanding, in Kafka we would need to have a custom
> > > > > > partitioner that routes the same messages to the same partition
> > > right?
> > > > >  I'm
> > > > > > trying to find examples of writing this partitioner logic, but I
> > > can't
> > > > > find
> > > > > > any. Can someone point me to an example?
> > > > > >
> > > > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> > > > >
> > > > > The partitioner here does a simple mod on the IP address and the #
> of
> > > > > partitions. You'd need to define your own logic, but this is a
> start.
> > > > >
> > > > >
> > > > > > - I see that Kafka server.properties allows one to specify the
> > number
> > > > of
> > > > > > partitions it supports. However, when we want to scale I wonder
> if
> > we
> > > > > add #
> > > > > > of partitions or # of brokers, will the same partitioner start
> > > > > distributing
> > > > > > the messages to different partitions?
> > > > > >  And if it does, how can that same consumer continue to read off
> > the
> > > > > > messages of those ids if it was interrupted in the middle?
> > > > > >
> > > > >
> > > > > I'll let someone else answer this.
> > > > >
> > > > >
> > > > > >
> > > > > > - I'd like to create a consumer per partition, and for each one
> to
> > > > > > subscribe to the changes of that one. How can this be done in
> > kafka?
> > > > > >
> > > > >
> > > > > Two ways: Simple Consumer or Consumer Groups:
> > > > >
> > > > > Depends on the level of control you want on code processing a
> > specific
> > > > > partition vs. getting one assigned to it (and level of control over
> > > > offset
> > > > > management).
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > > > <
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > >
> > > > >
> > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Tim
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Partitioning and scale

Posted by Timothy Chen <tn...@gmail.com>.
Hi Neha,

Not sure if this sounds crazy, but if we'd like to have the events for the
same session id go to the same partition one way could be that each session
key creates its own topic with single partition, therefore there could be
millions of topic with single partition.

I wonder what would be the bottleneck of doing this?

Thanks,

Tim


On Wed, May 22, 2013 at 4:32 PM, Neha Narkhede <ne...@gmail.com>wrote:

> Not automatically as of today. You have to run the reassign-partitions tool
> and explicitly move selected partitions to the new brokers. If you use this
> tool, you can move partitions to the new broker without any downtime.
>
> Thanks,
> Neha
>
>
> On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <tn...@gmail.com> wrote:
>
> > Hi Neha/Chris,
> >
> > Thanks for the reply, so if I set a fixed number of partitions and just
> add
> > brokers to the broker pool, does it rebalance the load to the new brokers
> > (along with the data)?
> >
> > Tim
> >
> >
> > On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > - I see that Kafka server.properties allows one to specify the number
> of
> > > partitions it supports. However, when we want to scale I wonder if we
> > add #
> > > of partitions or # of brokers, will the same partitioner start
> > distributing
> > > the messages to different partitions?
> > >  And if it does, how can that same consumer continue to read off the
> > > messages of those ids if it was interrupted in the middle?
> > >
> > > The num.partitions config in server.properties is used only for topics
> > that
> > > are auto created (controlled by auto.create.topics.enable). For topics
> > that
> > > you create using the admin tool, you can specify the number of
> partitions
> > > that you want. After that, currently there is no way to change that.
> For
> > > that reason, it is a good idea to over partition your topic, which also
> > > helps load balance partitions onto the brokers. You are right that if
> you
> > > change the number of partitions later, then previously messages that
> > stuck
> > > to a certain partition would now get routed to a different partition,
> > which
> > > is undesirable for applications that want to use sticky partitioning.
> > >
> > > - I'd like to create a consumer per partition, and for each one to
> > > subscribe to the changes of that one. How can this be done in kafka?
> > >
> > > For your use case, it seems like SimpleConsumer might be a better fit.
> > > However, it will require you to write code to handle discovery of
> leader
> > > for the partition that your consumer is consuming. Chris has written
> up a
> > > great example that you can follow -
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > >
> > > Thanks,
> > > Neha
> > >
> > >
> > > On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <curtin.chris@gmail.com
> > > >wrote:
> > >
> > > > Hi Tim,
> > > >
> > > >
> > > > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com>
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I'm currently trying to understand how Kafka (0.8) can scale with
> our
> > > > usage
> > > > > pattern and how to setup the partitioning.
> > > > >
> > > > > We want to route the same messages belonging to the same id to the
> > same
> > > > > queue, so its consumer will able to consume all the messages of
> that
> > > id.
> > > > >
> > > > > My questions:
> > > > >
> > > > >  - From my understanding, in Kafka we would need to have a custom
> > > > > partitioner that routes the same messages to the same partition
> > right?
> > > >  I'm
> > > > > trying to find examples of writing this partitioner logic, but I
> > can't
> > > > find
> > > > > any. Can someone point me to an example?
> > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> > > >
> > > > The partitioner here does a simple mod on the IP address and the # of
> > > > partitions. You'd need to define your own logic, but this is a start.
> > > >
> > > >
> > > > > - I see that Kafka server.properties allows one to specify the
> number
> > > of
> > > > > partitions it supports. However, when we want to scale I wonder if
> we
> > > > add #
> > > > > of partitions or # of brokers, will the same partitioner start
> > > > distributing
> > > > > the messages to different partitions?
> > > > >  And if it does, how can that same consumer continue to read off
> the
> > > > > messages of those ids if it was interrupted in the middle?
> > > > >
> > > >
> > > > I'll let someone else answer this.
> > > >
> > > >
> > > > >
> > > > > - I'd like to create a consumer per partition, and for each one to
> > > > > subscribe to the changes of that one. How can this be done in
> kafka?
> > > > >
> > > >
> > > > Two ways: Simple Consumer or Consumer Groups:
> > > >
> > > > Depends on the level of control you want on code processing a
> specific
> > > > partition vs. getting one assigned to it (and level of control over
> > > offset
> > > > management).
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > > >
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > > <
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > >
> > > >
> > > >
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Tim
> > > > >
> > > >
> > >
> >
>

Re: Partitioning and scale

Posted by Neha Narkhede <ne...@gmail.com>.
Not automatically as of today. You have to run the reassign-partitions tool
and explicitly move selected partitions to the new brokers. If you use this
tool, you can move partitions to the new broker without any downtime.

Thanks,
Neha


On Wed, May 22, 2013 at 2:20 PM, Timothy Chen <tn...@gmail.com> wrote:

> Hi Neha/Chris,
>
> Thanks for the reply, so if I set a fixed number of partitions and just add
> brokers to the broker pool, does it rebalance the load to the new brokers
> (along with the data)?
>
> Tim
>
>
> On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > - I see that Kafka server.properties allows one to specify the number of
> > partitions it supports. However, when we want to scale I wonder if we
> add #
> > of partitions or # of brokers, will the same partitioner start
> distributing
> > the messages to different partitions?
> >  And if it does, how can that same consumer continue to read off the
> > messages of those ids if it was interrupted in the middle?
> >
> > The num.partitions config in server.properties is used only for topics
> that
> > are auto created (controlled by auto.create.topics.enable). For topics
> that
> > you create using the admin tool, you can specify the number of partitions
> > that you want. After that, currently there is no way to change that. For
> > that reason, it is a good idea to over partition your topic, which also
> > helps load balance partitions onto the brokers. You are right that if you
> > change the number of partitions later, then previously messages that
> stuck
> > to a certain partition would now get routed to a different partition,
> which
> > is undesirable for applications that want to use sticky partitioning.
> >
> > - I'd like to create a consumer per partition, and for each one to
> > subscribe to the changes of that one. How can this be done in kafka?
> >
> > For your use case, it seems like SimpleConsumer might be a better fit.
> > However, it will require you to write code to handle discovery of leader
> > for the partition that your consumer is consuming. Chris has written up a
> > great example that you can follow -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> >
> > Thanks,
> > Neha
> >
> >
> > On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <curtin.chris@gmail.com
> > >wrote:
> >
> > > Hi Tim,
> > >
> > >
> > > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com>
> wrote:
> > >
> > > > Hi,
> > > >
> > > > I'm currently trying to understand how Kafka (0.8) can scale with our
> > > usage
> > > > pattern and how to setup the partitioning.
> > > >
> > > > We want to route the same messages belonging to the same id to the
> same
> > > > queue, so its consumer will able to consume all the messages of that
> > id.
> > > >
> > > > My questions:
> > > >
> > > >  - From my understanding, in Kafka we would need to have a custom
> > > > partitioner that routes the same messages to the same partition
> right?
> > >  I'm
> > > > trying to find examples of writing this partitioner logic, but I
> can't
> > > find
> > > > any. Can someone point me to an example?
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> > >
> > > The partitioner here does a simple mod on the IP address and the # of
> > > partitions. You'd need to define your own logic, but this is a start.
> > >
> > >
> > > > - I see that Kafka server.properties allows one to specify the number
> > of
> > > > partitions it supports. However, when we want to scale I wonder if we
> > > add #
> > > > of partitions or # of brokers, will the same partitioner start
> > > distributing
> > > > the messages to different partitions?
> > > >  And if it does, how can that same consumer continue to read off the
> > > > messages of those ids if it was interrupted in the middle?
> > > >
> > >
> > > I'll let someone else answer this.
> > >
> > >
> > > >
> > > > - I'd like to create a consumer per partition, and for each one to
> > > > subscribe to the changes of that one. How can this be done in kafka?
> > > >
> > >
> > > Two ways: Simple Consumer or Consumer Groups:
> > >
> > > Depends on the level of control you want on code processing a specific
> > > partition vs. getting one assigned to it (and level of control over
> > offset
> > > management).
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > > <
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> >
> > >
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Tim
> > > >
> > >
> >
>

Re: Partitioning and scale

Posted by Timothy Chen <tn...@gmail.com>.
Hi Neha/Chris,

Thanks for the reply, so if I set a fixed number of partitions and just add
brokers to the broker pool, does it rebalance the load to the new brokers
(along with the data)?

Tim


On Wed, May 22, 2013 at 1:15 PM, Neha Narkhede <ne...@gmail.com>wrote:

> - I see that Kafka server.properties allows one to specify the number of
> partitions it supports. However, when we want to scale I wonder if we add #
> of partitions or # of brokers, will the same partitioner start distributing
> the messages to different partitions?
>  And if it does, how can that same consumer continue to read off the
> messages of those ids if it was interrupted in the middle?
>
> The num.partitions config in server.properties is used only for topics that
> are auto created (controlled by auto.create.topics.enable). For topics that
> you create using the admin tool, you can specify the number of partitions
> that you want. After that, currently there is no way to change that. For
> that reason, it is a good idea to over partition your topic, which also
> helps load balance partitions onto the brokers. You are right that if you
> change the number of partitions later, then previously messages that stuck
> to a certain partition would now get routed to a different partition, which
> is undesirable for applications that want to use sticky partitioning.
>
> - I'd like to create a consumer per partition, and for each one to
> subscribe to the changes of that one. How can this be done in kafka?
>
> For your use case, it seems like SimpleConsumer might be a better fit.
> However, it will require you to write code to handle discovery of leader
> for the partition that your consumer is consuming. Chris has written up a
> great example that you can follow -
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
>
> Thanks,
> Neha
>
>
> On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <curtin.chris@gmail.com
> >wrote:
>
> > Hi Tim,
> >
> >
> > On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com> wrote:
> >
> > > Hi,
> > >
> > > I'm currently trying to understand how Kafka (0.8) can scale with our
> > usage
> > > pattern and how to setup the partitioning.
> > >
> > > We want to route the same messages belonging to the same id to the same
> > > queue, so its consumer will able to consume all the messages of that
> id.
> > >
> > > My questions:
> > >
> > >  - From my understanding, in Kafka we would need to have a custom
> > > partitioner that routes the same messages to the same partition right?
> >  I'm
> > > trying to find examples of writing this partitioner logic, but I can't
> > find
> > > any. Can someone point me to an example?
> > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
> >
> > The partitioner here does a simple mod on the IP address and the # of
> > partitions. You'd need to define your own logic, but this is a start.
> >
> >
> > > - I see that Kafka server.properties allows one to specify the number
> of
> > > partitions it supports. However, when we want to scale I wonder if we
> > add #
> > > of partitions or # of brokers, will the same partitioner start
> > distributing
> > > the messages to different partitions?
> > >  And if it does, how can that same consumer continue to read off the
> > > messages of those ids if it was interrupted in the middle?
> > >
> >
> > I'll let someone else answer this.
> >
> >
> > >
> > > - I'd like to create a consumer per partition, and for each one to
> > > subscribe to the changes of that one. How can this be done in kafka?
> > >
> >
> > Two ways: Simple Consumer or Consumer Groups:
> >
> > Depends on the level of control you want on code processing a specific
> > partition vs. getting one assigned to it (and level of control over
> offset
> > management).
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>
> >
> >
> > >
> > > Thanks,
> > >
> > > Tim
> > >
> >
>

Re: Partitioning and scale

Posted by Neha Narkhede <ne...@gmail.com>.
- I see that Kafka server.properties allows one to specify the number of
partitions it supports. However, when we want to scale I wonder if we add #
of partitions or # of brokers, will the same partitioner start distributing
the messages to different partitions?
 And if it does, how can that same consumer continue to read off the
messages of those ids if it was interrupted in the middle?

The num.partitions config in server.properties is used only for topics that
are auto created (controlled by auto.create.topics.enable). For topics that
you create using the admin tool, you can specify the number of partitions
that you want. After that, currently there is no way to change that. For
that reason, it is a good idea to over partition your topic, which also
helps load balance partitions onto the brokers. You are right that if you
change the number of partitions later, then previously messages that stuck
to a certain partition would now get routed to a different partition, which
is undesirable for applications that want to use sticky partitioning.

- I'd like to create a consumer per partition, and for each one to
subscribe to the changes of that one. How can this be done in kafka?

For your use case, it seems like SimpleConsumer might be a better fit.
However, it will require you to write code to handle discovery of leader
for the partition that your consumer is consuming. Chris has written up a
great example that you can follow -
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example

Thanks,
Neha


On Wed, May 22, 2013 at 12:37 PM, Chris Curtin <cu...@gmail.com>wrote:

> Hi Tim,
>
>
> On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com> wrote:
>
> > Hi,
> >
> > I'm currently trying to understand how Kafka (0.8) can scale with our
> usage
> > pattern and how to setup the partitioning.
> >
> > We want to route the same messages belonging to the same id to the same
> > queue, so its consumer will able to consume all the messages of that id.
> >
> > My questions:
> >
> >  - From my understanding, in Kafka we would need to have a custom
> > partitioner that routes the same messages to the same partition right?
>  I'm
> > trying to find examples of writing this partitioner logic, but I can't
> find
> > any. Can someone point me to an example?
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example
>
> The partitioner here does a simple mod on the IP address and the # of
> partitions. You'd need to define your own logic, but this is a start.
>
>
> > - I see that Kafka server.properties allows one to specify the number of
> > partitions it supports. However, when we want to scale I wonder if we
> add #
> > of partitions or # of brokers, will the same partitioner start
> distributing
> > the messages to different partitions?
> >  And if it does, how can that same consumer continue to read off the
> > messages of those ids if it was interrupted in the middle?
> >
>
> I'll let someone else answer this.
>
>
> >
> > - I'd like to create a consumer per partition, and for each one to
> > subscribe to the changes of that one. How can this be done in kafka?
> >
>
> Two ways: Simple Consumer or Consumer Groups:
>
> Depends on the level of control you want on code processing a specific
> partition vs. getting one assigned to it (and level of control over offset
> management).
>
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example
> <https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>
>
>
> >
> > Thanks,
> >
> > Tim
> >
>

Re: Partitioning and scale

Posted by Chris Curtin <cu...@gmail.com>.
Hi Tim,


On Wed, May 22, 2013 at 3:25 PM, Timothy Chen <tn...@gmail.com> wrote:

> Hi,
>
> I'm currently trying to understand how Kafka (0.8) can scale with our usage
> pattern and how to setup the partitioning.
>
> We want to route the same messages belonging to the same id to the same
> queue, so its consumer will able to consume all the messages of that id.
>
> My questions:
>
>  - From my understanding, in Kafka we would need to have a custom
> partitioner that routes the same messages to the same partition right?  I'm
> trying to find examples of writing this partitioner logic, but I can't find
> any. Can someone point me to an example?
>
> https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example

The partitioner here does a simple mod on the IP address and the # of
partitions. You'd need to define your own logic, but this is a start.


> - I see that Kafka server.properties allows one to specify the number of
> partitions it supports. However, when we want to scale I wonder if we add #
> of partitions or # of brokers, will the same partitioner start distributing
> the messages to different partitions?
>  And if it does, how can that same consumer continue to read off the
> messages of those ids if it was interrupted in the middle?
>

I'll let someone else answer this.


>
> - I'd like to create a consumer per partition, and for each one to
> subscribe to the changes of that one. How can this be done in kafka?
>

Two ways: Simple Consumer or Consumer Groups:

Depends on the level of control you want on code processing a specific
partition vs. getting one assigned to it (and level of control over offset
management).

https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example

https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+SimpleConsumer+Example<https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example>


>
> Thanks,
>
> Tim
>