You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bob Jervis <bj...@visibletechnologies.com> on 2012/10/12 20:16:49 UTC

How well does Kafka handle many queues per process?

We have an application that ingests data, funnels it through one server which then sprays it out to a distributed cluster.  The data in the cluster is sharded and each shard is replicated twice in the cluster.  Any given node has a few dozen shards and we are thinking that we want to increase the number of data shards so that we will have hundreds of such shards across the cluster at any given time.  We need to get the data to the cluster with very low latency.  Nodes can go down and shards can get redistributed, so we can't easily map between message and what nodes need it (since the consumer nodes may change between when the message was written and when it needs to be consumed).

We are evaluating Kafka for use in routing our incoming posts to the clustered servers.  It looks like Kafka supports broker-side filtering (true?), but the API docs are a little sparse.

Which would make more sense to implement:


1.       Write to several hundred queues and have each clustered server read from a few dozen (either blocking on one thread per queue or timing out and round-robining netweem queues).

2.       Write a much smaller number of queues and have each clustered server filter for the content they need.

And of course the story wouldn't be complete if there weren't other consumers needing to read the same data but without the strict latency requirements.

Any suggestions?

Bob Jervis | Senior Architect

[cid:image001.png@01CDA860.92B40310]<http://www.visibletechnologies.com/>
Seattle | Boston | New York | London
Phone: 425.957.6075 | Fax: 781.404.5711

Follow Visibly Intelligent Blog<http://www.visibletechnologies.com/blog/>

[cid:image002.png@01CDA860.92B40310]<http://twitter.com/visible>[cid:image003.png@01CDA860.92B40310]<http://www.facebook.com/Visible.Technologies> [cid:image004.png@01CDA860.92B40310] <http://www.linkedin.com/company/visible-technologies>


Re: How well does Kafka handle many queues per process?

Posted by Jun Rao <ju...@gmail.com>.
Bob,

It seems to me that approach 1 would be more efficient since less bytes are
transferred to consumers.

Thanks,

Jun

On Fri, Oct 12, 2012 at 11:16 AM, Bob Jervis <
bjervis@visibletechnologies.com> wrote:

> We have an application that ingests data, funnels it through one server
> which then sprays it out to a distributed cluster.  The data in the cluster
> is sharded and each shard is replicated twice in the cluster.  Any given
> node has a few dozen shards and we are thinking that we want to increase
> the number of data shards so that we will have hundreds of such shards
> across the cluster at any given time.  We need to get the data to the
> cluster with very low latency.  Nodes can go down and shards can get
> redistributed, so we can’t easily map between message and what nodes need
> it (since the consumer nodes may change between when the message was
> written and when it needs to be consumed).****
>
> ** **
>
> We are evaluating Kafka for use in routing our incoming posts to the
> clustered servers.  It looks like Kafka supports broker-side filtering
> (true?), but the API docs are a little sparse. ****
>
> ** **
>
> Which would make more sense to implement:****
>
> ** **
>
> **1.       **Write to several hundred queues and have each clustered
> server read from a few dozen (either blocking on one thread per queue or
> timing out and round-robining netweem queues).****
>
> **2.       **Write a much smaller number of queues and have each
> clustered server filter for the content they need.****
>
> ** **
>
> And of course the story wouldn’t be complete if there weren’t other
> consumers needing to read the same data but without the strict latency
> requirements.****
>
> ** **
>
> Any suggestions?****
>
> ** **
>
> *Bob Jervis | Senior Architect*
>
>
> *[image: Description: Description: Visible-sm]*<http://www.visibletechnologies.com/>
> **
>
> Seattle *| *Boston* | *New York *|* London****
>
> *Phone:* 425.957.6075* | Fax:* 781.404.5711 ****
>
> ** **
>
> *Follow Visibly Intelligent Blog<http://www.visibletechnologies.com/blog/>
> *
>
> ** **
>
> [image: Description: Description: LinkedIn_Logo60px[1]]<http://twitter.com/visible>[image:
> Description: Description: facebook]<http://www.facebook.com/Visible.Technologies>
>  [image: Description: Description: in]<http://www.linkedin.com/company/visible-technologies>
> ****
>
> ** **
>