You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Lorenzo Alberton <l....@gmail.com> on 2012/07/30 17:39:00 UTC

Thousands of topics

Is there anyone who tried Kafka with thousands of concurrent topics?
If so, what are your experiences? How did you tune it?

Thanks!

Re: Thousands of topics

Posted by Lorenzo Alberton <l....@gmail.com>.
Hi Taylor,

thanks for your reply. I'd love to read your blog post about your
experiences with it, especially around hardware configuration and how you
consume the data (few/many short/long-lived processes, average throughput
per topic). The cleanup script seems really useful too, I was considering
writing one that also cleans dead topics off zookeeper.

Thanks!

Lorenzo


On Tue, Jul 31, 2012 at 8:58 PM, Taylor Gautier <tg...@tagged.com> wrote:

> Yes, we have done so at Tagged.  I chronicled a bit of our experience here
> on the the mailing list.  Effectively we found that a single machine could
> not go above ~20k total topics.  This could be OS dependent however (we use
> CentOS 5.x)
>
> Various tweaks we made to go further:
>
>    1. a beefed up node.js kafka client/producer implementation -
>    https://github.com/tagged/node-kafka lies at the heart of our kafka
>    deployment
>    2. our own kafka software load balancer (implemented using said library)
>    that shards out independent Kafka instances (guarantees in-order
> delivery
>    per topic and scales the # of kafka topics linearly as a function of
> the #
>    of kafka machines)
>    3. a continuous cleaner that removes old dead topics completely from the
>    filesystem (0.7 cleaner leaves empty directory/file which eats up open
> file
>    handles and limits max # of topics)
>    4. (coming soon) a hierarchical topic directory structure to ease the
>    pain of too main directories/files in a single directory (should help
> the
>    ~20k number, though probably by less than you might imagine)
>
> On our todo list is blogging about this in more detail, and contributing
> back more than just the node.js implementation.
>
> On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <l.alberton@gmail.com
> >wrote:
>
> > Is there anyone who tried Kafka with thousands of concurrent topics?
> > If so, what are your experiences? How did you tune it?
> >
> > Thanks!
> >
>

Re: Thousands of topics

Posted by Johan Rydberg <jo...@gmail.com>.
>    2. our own kafka software load balancer (implemented using said library)
>    that shards out independent Kafka instances (guarantees in-order delivery
>    per topic and scales the # of kafka topics linearly as a function of the #
>    of kafka machines)

I'm just researching how you can get some kind of in-order delivery guarantees
using Kafka.

Could you maybe tell us a bit more on how you approached it?

Re: Thousands of topics

Posted by Taylor Gautier <tg...@tagged.com>.
Yes, we have done so at Tagged.  I chronicled a bit of our experience here
on the the mailing list.  Effectively we found that a single machine could
not go above ~20k total topics.  This could be OS dependent however (we use
CentOS 5.x)

Various tweaks we made to go further:

   1. a beefed up node.js kafka client/producer implementation -
   https://github.com/tagged/node-kafka lies at the heart of our kafka
   deployment
   2. our own kafka software load balancer (implemented using said library)
   that shards out independent Kafka instances (guarantees in-order delivery
   per topic and scales the # of kafka topics linearly as a function of the #
   of kafka machines)
   3. a continuous cleaner that removes old dead topics completely from the
   filesystem (0.7 cleaner leaves empty directory/file which eats up open file
   handles and limits max # of topics)
   4. (coming soon) a hierarchical topic directory structure to ease the
   pain of too main directories/files in a single directory (should help the
   ~20k number, though probably by less than you might imagine)

On our todo list is blogging about this in more detail, and contributing
back more than just the node.js implementation.

On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <l....@gmail.com>wrote:

> Is there anyone who tried Kafka with thousands of concurrent topics?
> If so, what are your experiences? How did you tune it?
>
> Thanks!
>