You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Lorenzo Alberton <l....@gmail.com> on 2012/08/07 23:47:34 UTC

Re: Thousands of topics

Hi Taylor,

thanks for your reply. I'd love to read your blog post about your
experiences with it, especially around hardware configuration and how you
consume the data (few/many short/long-lived processes, average throughput
per topic). The cleanup script seems really useful too, I was considering
writing one that also cleans dead topics off zookeeper.

Thanks!

Lorenzo


On Tue, Jul 31, 2012 at 8:58 PM, Taylor Gautier <tg...@tagged.com> wrote:

> Yes, we have done so at Tagged.  I chronicled a bit of our experience here
> on the the mailing list.  Effectively we found that a single machine could
> not go above ~20k total topics.  This could be OS dependent however (we use
> CentOS 5.x)
>
> Various tweaks we made to go further:
>
>    1. a beefed up node.js kafka client/producer implementation -
>    https://github.com/tagged/node-kafka lies at the heart of our kafka
>    deployment
>    2. our own kafka software load balancer (implemented using said library)
>    that shards out independent Kafka instances (guarantees in-order
> delivery
>    per topic and scales the # of kafka topics linearly as a function of
> the #
>    of kafka machines)
>    3. a continuous cleaner that removes old dead topics completely from the
>    filesystem (0.7 cleaner leaves empty directory/file which eats up open
> file
>    handles and limits max # of topics)
>    4. (coming soon) a hierarchical topic directory structure to ease the
>    pain of too main directories/files in a single directory (should help
> the
>    ~20k number, though probably by less than you might imagine)
>
> On our todo list is blogging about this in more detail, and contributing
> back more than just the node.js implementation.
>
> On Mon, Jul 30, 2012 at 8:39 AM, Lorenzo Alberton <l.alberton@gmail.com
> >wrote:
>
> > Is there anyone who tried Kafka with thousands of concurrent topics?
> > If so, what are your experiences? How did you tune it?
> >
> > Thanks!
> >
>