You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Taylor Gautier <tg...@tagged.com> on 2011/07/22 22:23:20 UTC

Question - large number of topics

Hi.

I am thinking to use kafka to send/receive messages for a large number of
topics - order of 100k - 1M.

It seems that the directory structure used for topics will probably not work
for this usage.  Also, I'm not sure if the in-memory data structures might
suffer - and also it may be problematic for zookeeper.

One thought I have is to modify the directory structure to be a tree of
directories.  Not sure what if anything might need to be done to in-memory
structures or zookeeper info.

Any thoughts?

Re: Question - large number of topics

Posted by Taylor Gautier <tg...@tagged.com>.
oh - so if I have n topics I just cycle over those n topics and send out n
fetch requests.

On Fri, Aug 19, 2011 at 2:28 PM, Joel Koshy <jj...@gmail.com> wrote:

> Hi Taylor,
>
> The topic is not part of the wire protocol - the simple consumer is
> not asynchronous. So the topic for each response is for whatever topic
> the preceding fetch request was for.
>
> Thanks,
>
> Joel
>
> On Fri, Aug 19, 2011 at 12:34 PM, Taylor Gautier <tg...@tagged.com>
> wrote:
> > Great!  How is this done?  I'm working with the node kafka client here:
> >
> > http://proxworx.appspot.com/github.com/marcuswestin/node-kafka
> >
> > And I think it only supports one topic so I need to update the code -
> when I
> > was looking at the binary responses I wasn't sure how the response will
> be
> > formatted so I can distinguish messages from different topics.
> >
> > (this discussion may be more appropriate for kafka-dev)
> >
> > On Fri, Aug 19, 2011 at 12:29 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> Taylor,
> >>
> >> For topics stored on the same broker, kafka consumer can consume
> multiple
> >> topics over a single socket connection.
> >>
> >> Jun
> >>
> >> On Fri, Aug 19, 2011 at 11:14 AM, Taylor Gautier <tgautier@tagged.com
> >> >wrote:
> >>
> >> > Thanks for the responses.
> >> >
> >> > Coming back to this topic - on the wire protocol is it possible to
> >> register
> >> > interest for more than one topic - or is it 1:1 tcp connection to
> topic?
> >> >
> >> > Inspecting the binary formats it looks like it has to be 1:1.
> >> >
> >> > Thanks.
> >> >
> >> > On Fri, Jul 22, 2011 at 4:37 PM, Jay Kreps <ja...@gmail.com>
> wrote:
> >> >
> >> > > Hi Taylor,
> >> > >
> >> > > I think you are correct the single-node scalability for the number
> of
> >> > > topics
> >> > > is not that great due to having multiple files per topic. I think
> the
> >> > large
> >> > > directory problem can probably be mitigated by using a more modern
> >> > > filesystem, but as you and Jun point out ZK may also be strained.
> >> > >
> >> > > One thing that may not be obvious is it is not required to keep all
> >> > topics
> >> > > on all machines, this will help scale the non-zk aspects. To do this
> >> you
> >> > > can
> >> > > either pre-create the topics or else add a custom partitioner which
> >> maps
> >> > > particular topics only to a subset of machines. In this way if you
> had,
> >> > say
> >> > > 15 machines you could spread each topic over 3 machines and get 5X
> the
> >> > max
> >> > > number of topics.
> >> > >
> >> > > -Jay
> >> > >
> >> > > On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > >
> >> > > > Hi, Tayler,
> >> > > >
> >> > > > That's a good question. As your pointed out, a large number of
> topics
> >> > > will
> >> > > > put stress on local file directory and ZK. Maybe you can do a bit
> >> > testing
> >> > > > first to see what breaks with a large number of topics. After
> that,
> >> we
> >> > > can
> >> > > > look into what needs to be fixed. Making the directory structure
> >> > > > hierarchical is a possibility.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Jun
> >> > > >
> >> > > >
> >> > > > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <
> tgautier@tagged.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > Hi.
> >> > > > >
> >> > > > > I am thinking to use kafka to send/receive messages for a large
> >> > number
> >> > > of
> >> > > > > topics - order of 100k - 1M.
> >> > > > >
> >> > > > > It seems that the directory structure used for topics will
> probably
> >> > not
> >> > > > > work
> >> > > > > for this usage.  Also, I'm not sure if the in-memory data
> >> structures
> >> > > > might
> >> > > > > suffer - and also it may be problematic for zookeeper.
> >> > > > >
> >> > > > > One thought I have is to modify the directory structure to be a
> >> tree
> >> > of
> >> > > > > directories.  Not sure what if anything might need to be done to
> >> > > > in-memory
> >> > > > > structures or zookeeper info.
> >> > > > >
> >> > > > > Any thoughts?
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
>

Re: Question - large number of topics

Posted by Joel Koshy <jj...@gmail.com>.
Hi Taylor,

The topic is not part of the wire protocol - the simple consumer is
not asynchronous. So the topic for each response is for whatever topic
the preceding fetch request was for.

Thanks,

Joel

On Fri, Aug 19, 2011 at 12:34 PM, Taylor Gautier <tg...@tagged.com> wrote:
> Great!  How is this done?  I'm working with the node kafka client here:
>
> http://proxworx.appspot.com/github.com/marcuswestin/node-kafka
>
> And I think it only supports one topic so I need to update the code - when I
> was looking at the binary responses I wasn't sure how the response will be
> formatted so I can distinguish messages from different topics.
>
> (this discussion may be more appropriate for kafka-dev)
>
> On Fri, Aug 19, 2011 at 12:29 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> Taylor,
>>
>> For topics stored on the same broker, kafka consumer can consume multiple
>> topics over a single socket connection.
>>
>> Jun
>>
>> On Fri, Aug 19, 2011 at 11:14 AM, Taylor Gautier <tgautier@tagged.com
>> >wrote:
>>
>> > Thanks for the responses.
>> >
>> > Coming back to this topic - on the wire protocol is it possible to
>> register
>> > interest for more than one topic - or is it 1:1 tcp connection to topic?
>> >
>> > Inspecting the binary formats it looks like it has to be 1:1.
>> >
>> > Thanks.
>> >
>> > On Fri, Jul 22, 2011 at 4:37 PM, Jay Kreps <ja...@gmail.com> wrote:
>> >
>> > > Hi Taylor,
>> > >
>> > > I think you are correct the single-node scalability for the number of
>> > > topics
>> > > is not that great due to having multiple files per topic. I think the
>> > large
>> > > directory problem can probably be mitigated by using a more modern
>> > > filesystem, but as you and Jun point out ZK may also be strained.
>> > >
>> > > One thing that may not be obvious is it is not required to keep all
>> > topics
>> > > on all machines, this will help scale the non-zk aspects. To do this
>> you
>> > > can
>> > > either pre-create the topics or else add a custom partitioner which
>> maps
>> > > particular topics only to a subset of machines. In this way if you had,
>> > say
>> > > 15 machines you could spread each topic over 3 machines and get 5X the
>> > max
>> > > number of topics.
>> > >
>> > > -Jay
>> > >
>> > > On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:
>> > >
>> > > > Hi, Tayler,
>> > > >
>> > > > That's a good question. As your pointed out, a large number of topics
>> > > will
>> > > > put stress on local file directory and ZK. Maybe you can do a bit
>> > testing
>> > > > first to see what breaks with a large number of topics. After that,
>> we
>> > > can
>> > > > look into what needs to be fixed. Making the directory structure
>> > > > hierarchical is a possibility.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > >
>> > > > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tgautier@tagged.com
>> >
>> > > > wrote:
>> > > >
>> > > > > Hi.
>> > > > >
>> > > > > I am thinking to use kafka to send/receive messages for a large
>> > number
>> > > of
>> > > > > topics - order of 100k - 1M.
>> > > > >
>> > > > > It seems that the directory structure used for topics will probably
>> > not
>> > > > > work
>> > > > > for this usage.  Also, I'm not sure if the in-memory data
>> structures
>> > > > might
>> > > > > suffer - and also it may be problematic for zookeeper.
>> > > > >
>> > > > > One thought I have is to modify the directory structure to be a
>> tree
>> > of
>> > > > > directories.  Not sure what if anything might need to be done to
>> > > > in-memory
>> > > > > structures or zookeeper info.
>> > > > >
>> > > > > Any thoughts?
>> > > > >
>> > > >
>> > >
>> >
>>
>

Re: Question - large number of topics

Posted by Taylor Gautier <tg...@tagged.com>.
Great!  How is this done?  I'm working with the node kafka client here:

http://proxworx.appspot.com/github.com/marcuswestin/node-kafka

And I think it only supports one topic so I need to update the code - when I
was looking at the binary responses I wasn't sure how the response will be
formatted so I can distinguish messages from different topics.

(this discussion may be more appropriate for kafka-dev)

On Fri, Aug 19, 2011 at 12:29 PM, Jun Rao <ju...@gmail.com> wrote:

> Taylor,
>
> For topics stored on the same broker, kafka consumer can consume multiple
> topics over a single socket connection.
>
> Jun
>
> On Fri, Aug 19, 2011 at 11:14 AM, Taylor Gautier <tgautier@tagged.com
> >wrote:
>
> > Thanks for the responses.
> >
> > Coming back to this topic - on the wire protocol is it possible to
> register
> > interest for more than one topic - or is it 1:1 tcp connection to topic?
> >
> > Inspecting the binary formats it looks like it has to be 1:1.
> >
> > Thanks.
> >
> > On Fri, Jul 22, 2011 at 4:37 PM, Jay Kreps <ja...@gmail.com> wrote:
> >
> > > Hi Taylor,
> > >
> > > I think you are correct the single-node scalability for the number of
> > > topics
> > > is not that great due to having multiple files per topic. I think the
> > large
> > > directory problem can probably be mitigated by using a more modern
> > > filesystem, but as you and Jun point out ZK may also be strained.
> > >
> > > One thing that may not be obvious is it is not required to keep all
> > topics
> > > on all machines, this will help scale the non-zk aspects. To do this
> you
> > > can
> > > either pre-create the topics or else add a custom partitioner which
> maps
> > > particular topics only to a subset of machines. In this way if you had,
> > say
> > > 15 machines you could spread each topic over 3 machines and get 5X the
> > max
> > > number of topics.
> > >
> > > -Jay
> > >
> > > On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Hi, Tayler,
> > > >
> > > > That's a good question. As your pointed out, a large number of topics
> > > will
> > > > put stress on local file directory and ZK. Maybe you can do a bit
> > testing
> > > > first to see what breaks with a large number of topics. After that,
> we
> > > can
> > > > look into what needs to be fixed. Making the directory structure
> > > > hierarchical is a possibility.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tgautier@tagged.com
> >
> > > > wrote:
> > > >
> > > > > Hi.
> > > > >
> > > > > I am thinking to use kafka to send/receive messages for a large
> > number
> > > of
> > > > > topics - order of 100k - 1M.
> > > > >
> > > > > It seems that the directory structure used for topics will probably
> > not
> > > > > work
> > > > > for this usage.  Also, I'm not sure if the in-memory data
> structures
> > > > might
> > > > > suffer - and also it may be problematic for zookeeper.
> > > > >
> > > > > One thought I have is to modify the directory structure to be a
> tree
> > of
> > > > > directories.  Not sure what if anything might need to be done to
> > > > in-memory
> > > > > structures or zookeeper info.
> > > > >
> > > > > Any thoughts?
> > > > >
> > > >
> > >
> >
>

Re: Question - large number of topics

Posted by Jun Rao <ju...@gmail.com>.
Taylor,

For topics stored on the same broker, kafka consumer can consume multiple
topics over a single socket connection.

Jun

On Fri, Aug 19, 2011 at 11:14 AM, Taylor Gautier <tg...@tagged.com>wrote:

> Thanks for the responses.
>
> Coming back to this topic - on the wire protocol is it possible to register
> interest for more than one topic - or is it 1:1 tcp connection to topic?
>
> Inspecting the binary formats it looks like it has to be 1:1.
>
> Thanks.
>
> On Fri, Jul 22, 2011 at 4:37 PM, Jay Kreps <ja...@gmail.com> wrote:
>
> > Hi Taylor,
> >
> > I think you are correct the single-node scalability for the number of
> > topics
> > is not that great due to having multiple files per topic. I think the
> large
> > directory problem can probably be mitigated by using a more modern
> > filesystem, but as you and Jun point out ZK may also be strained.
> >
> > One thing that may not be obvious is it is not required to keep all
> topics
> > on all machines, this will help scale the non-zk aspects. To do this you
> > can
> > either pre-create the topics or else add a custom partitioner which maps
> > particular topics only to a subset of machines. In this way if you had,
> say
> > 15 machines you could spread each topic over 3 machines and get 5X the
> max
> > number of topics.
> >
> > -Jay
> >
> > On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Hi, Tayler,
> > >
> > > That's a good question. As your pointed out, a large number of topics
> > will
> > > put stress on local file directory and ZK. Maybe you can do a bit
> testing
> > > first to see what breaks with a large number of topics. After that, we
> > can
> > > look into what needs to be fixed. Making the directory structure
> > > hierarchical is a possibility.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tg...@tagged.com>
> > > wrote:
> > >
> > > > Hi.
> > > >
> > > > I am thinking to use kafka to send/receive messages for a large
> number
> > of
> > > > topics - order of 100k - 1M.
> > > >
> > > > It seems that the directory structure used for topics will probably
> not
> > > > work
> > > > for this usage.  Also, I'm not sure if the in-memory data structures
> > > might
> > > > suffer - and also it may be problematic for zookeeper.
> > > >
> > > > One thought I have is to modify the directory structure to be a tree
> of
> > > > directories.  Not sure what if anything might need to be done to
> > > in-memory
> > > > structures or zookeeper info.
> > > >
> > > > Any thoughts?
> > > >
> > >
> >
>

Re: Question - large number of topics

Posted by Taylor Gautier <tg...@tagged.com>.
Thanks for the responses.

Coming back to this topic - on the wire protocol is it possible to register
interest for more than one topic - or is it 1:1 tcp connection to topic?

Inspecting the binary formats it looks like it has to be 1:1.

Thanks.

On Fri, Jul 22, 2011 at 4:37 PM, Jay Kreps <ja...@gmail.com> wrote:

> Hi Taylor,
>
> I think you are correct the single-node scalability for the number of
> topics
> is not that great due to having multiple files per topic. I think the large
> directory problem can probably be mitigated by using a more modern
> filesystem, but as you and Jun point out ZK may also be strained.
>
> One thing that may not be obvious is it is not required to keep all topics
> on all machines, this will help scale the non-zk aspects. To do this you
> can
> either pre-create the topics or else add a custom partitioner which maps
> particular topics only to a subset of machines. In this way if you had, say
> 15 machines you could spread each topic over 3 machines and get 5X the max
> number of topics.
>
> -Jay
>
> On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Hi, Tayler,
> >
> > That's a good question. As your pointed out, a large number of topics
> will
> > put stress on local file directory and ZK. Maybe you can do a bit testing
> > first to see what breaks with a large number of topics. After that, we
> can
> > look into what needs to be fixed. Making the directory structure
> > hierarchical is a possibility.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tg...@tagged.com>
> > wrote:
> >
> > > Hi.
> > >
> > > I am thinking to use kafka to send/receive messages for a large number
> of
> > > topics - order of 100k - 1M.
> > >
> > > It seems that the directory structure used for topics will probably not
> > > work
> > > for this usage.  Also, I'm not sure if the in-memory data structures
> > might
> > > suffer - and also it may be problematic for zookeeper.
> > >
> > > One thought I have is to modify the directory structure to be a tree of
> > > directories.  Not sure what if anything might need to be done to
> > in-memory
> > > structures or zookeeper info.
> > >
> > > Any thoughts?
> > >
> >
>

Re: Question - large number of topics

Posted by Jay Kreps <ja...@gmail.com>.
Hi Taylor,

I think you are correct the single-node scalability for the number of topics
is not that great due to having multiple files per topic. I think the large
directory problem can probably be mitigated by using a more modern
filesystem, but as you and Jun point out ZK may also be strained.

One thing that may not be obvious is it is not required to keep all topics
on all machines, this will help scale the non-zk aspects. To do this you can
either pre-create the topics or else add a custom partitioner which maps
particular topics only to a subset of machines. In this way if you had, say
15 machines you could spread each topic over 3 machines and get 5X the max
number of topics.

-Jay

On Fri, Jul 22, 2011 at 2:06 PM, Jun Rao <ju...@gmail.com> wrote:

> Hi, Tayler,
>
> That's a good question. As your pointed out, a large number of topics will
> put stress on local file directory and ZK. Maybe you can do a bit testing
> first to see what breaks with a large number of topics. After that, we can
> look into what needs to be fixed. Making the directory structure
> hierarchical is a possibility.
>
> Thanks,
>
> Jun
>
>
> On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tg...@tagged.com>
> wrote:
>
> > Hi.
> >
> > I am thinking to use kafka to send/receive messages for a large number of
> > topics - order of 100k - 1M.
> >
> > It seems that the directory structure used for topics will probably not
> > work
> > for this usage.  Also, I'm not sure if the in-memory data structures
> might
> > suffer - and also it may be problematic for zookeeper.
> >
> > One thought I have is to modify the directory structure to be a tree of
> > directories.  Not sure what if anything might need to be done to
> in-memory
> > structures or zookeeper info.
> >
> > Any thoughts?
> >
>

Re: Question - large number of topics

Posted by Jun Rao <ju...@gmail.com>.
Hi, Tayler,

That's a good question. As your pointed out, a large number of topics will
put stress on local file directory and ZK. Maybe you can do a bit testing
first to see what breaks with a large number of topics. After that, we can
look into what needs to be fixed. Making the directory structure
hierarchical is a possibility.

Thanks,

Jun


On Fri, Jul 22, 2011 at 1:23 PM, Taylor Gautier <tg...@tagged.com> wrote:

> Hi.
>
> I am thinking to use kafka to send/receive messages for a large number of
> topics - order of 100k - 1M.
>
> It seems that the directory structure used for topics will probably not
> work
> for this usage.  Also, I'm not sure if the in-memory data structures might
> suffer - and also it may be problematic for zookeeper.
>
> One thought I have is to modify the directory structure to be a tree of
> directories.  Not sure what if anything might need to be done to in-memory
> structures or zookeeper info.
>
> Any thoughts?
>