You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Sivananda Reddy <si...@gmail.com> on 2015/04/14 23:37:13 UTC

Design questions related to kafka

Hi,

    # I looked the documents of kafka and I see that there is no way a
consume instance can
       read specific messages from partition.
    # I have an use case where I need to spawn a topic(single partition)
for each user,
       so there would be 10k online users at a time, there would be very
less data per topic.
       My questions is, what are limitations of having multiple topics(with
1 partition), I think
       this situation would cause heavy memory consumption and are they any
other limitations?.
       Basically the problem boils down to what are the scalability
limitations of having
       multiple topics(hardware/software)?

Thanks a lot in advance.

Regards,
Siva.

Re: Design questions related to kafka

Posted by Sivananda Reddy <si...@gmail.com>.

Hi,

@Manoj/@Pete: Thanks for the inputs. I am already aware of the parallelism
provided by kafka. My use case needed single topic per user, but I came up
with a workaround for that and so the problem is solved.

@Guozhang: I agree, kafka stores data related to partitions and topics(
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper)
in zookeeper. But if its just the memory constraint why can't we increase
the memory footprint for zookeeper?. It's configurable isn't it?. Any
thoughts on this, please let me know?

Thank you,
Siva.

On Thu, Apr 16, 2015 at 9:18 AM, Guozhang Wang <wa...@gmail.com> wrote:

> Siva,
>
> For Kafka brokers as long as the total #.partitions is not too much I think
> it should be fine (we have been hosting 600+ partitions on a single node).
> You have to pay some attention on your ZK nodes though since with #.topics
> increasing its metadata storage will take more and more space.
>
> Guozhang
>
> On Tue, Apr 14, 2015 at 2:37 PM, Sivananda Reddy <si...@gmail.com>
> wrote:
>
> > Hi,
> >
> >     # I looked the documents of kafka and I see that there is no way a
> > consume instance can
> >        read specific messages from partition.
> >     # I have an use case where I need to spawn a topic(single partition)
> > for each user,
> >        so there would be 10k online users at a time, there would be very
> > less data per topic.
> >        My questions is, what are limitations of having multiple
> topics(with
> > 1 partition), I think
> >        this situation would cause heavy memory consumption and are they
> any
> > other limitations?.
> >        Basically the problem boils down to what are the scalability
> > limitations of having
> >        multiple topics(hardware/software)?
> >
> > Thanks a lot in advance.
> >
> > Regards,
> > Siva.
> >
>
>
>
> --
> -- Guozhang
>

Re: Design questions related to kafka

Posted by Guozhang Wang <wa...@gmail.com>.

Siva,

For Kafka brokers as long as the total #.partitions is not too much I think
it should be fine (we have been hosting 600+ partitions on a single node).
You have to pay some attention on your ZK nodes though since with #.topics
increasing its metadata storage will take more and more space.

Guozhang

On Tue, Apr 14, 2015 at 2:37 PM, Sivananda Reddy <si...@gmail.com>
wrote:

> Hi,
>
>     # I looked the documents of kafka and I see that there is no way a
> consume instance can
>        read specific messages from partition.
>     # I have an use case where I need to spawn a topic(single partition)
> for each user,
>        so there would be 10k online users at a time, there would be very
> less data per topic.
>        My questions is, what are limitations of having multiple topics(with
> 1 partition), I think
>        this situation would cause heavy memory consumption and are they any
> other limitations?.
>        Basically the problem boils down to what are the scalability
> limitations of having
>        multiple topics(hardware/software)?
>
> Thanks a lot in advance.
>
> Regards,
> Siva.
>

-- 
-- Guozhang

Re: Design questions related to kafka

Posted by Manoj Khangaonkar <kh...@gmail.com>.

I meant you can read messages multiple times if you want to.

Yes, you would store offset and request reading from an offset with Simple
Consumer API to implement once and only once delivery.

regards

On Wed, Apr 15, 2015 at 10:55 AM, Pete Wright <pw...@rubiconproject.com>
wrote:

>
>
> On 04/15/15 09:31, Manoj Khangaonkar wrote:
> >     # I looked the documents of kafka and I see that there is no way a
> >> consume instance can
> >>        read specific messages from partition.
> >>
> >
> > With Kafka you read messages from the beginning multiple times. Since you
> > say later that
> > you do not have many messages per topic, you can iterate over the message
> > and read the ones
> > that you need. Of course this might not the be the most efficient.
> >
>
> couldn't you also store the offset of the last read message, then resume
> reading messages after that offset to ensure your consumer does not
> consume the same message twice?
>
> Cheers,
> -pete
>
> --
> Pete Wright
> Lead Systems Architect
> Rubicon Project
> pwright@rubiconproject.com
> 310.309.9298
>



-- 
http://khangaonkar.blogspot.com/

Re: Design questions related to kafka

Posted by Pete Wright <pw...@rubiconproject.com>.


On 04/15/15 09:31, Manoj Khangaonkar wrote:
>     # I looked the documents of kafka and I see that there is no way a
>> consume instance can
>>        read specific messages from partition.
>>
> 
> With Kafka you read messages from the beginning multiple times. Since you
> say later that
> you do not have many messages per topic, you can iterate over the message
> and read the ones
> that you need. Of course this might not the be the most efficient.
> 

couldn't you also store the offset of the last read message, then resume
reading messages after that offset to ensure your consumer does not
consume the same message twice?

Cheers,
-pete

-- 
Pete Wright
Lead Systems Architect
Rubicon Project
pwright@rubiconproject.com
310.309.9298

Re: Design questions related to kafka

Posted by Manoj Khangaonkar <kh...@gmail.com>.

    # I looked the documents of kafka and I see that there is no way a
> consume instance can
>        read specific messages from partition.
>

With Kafka you read messages from the beginning multiple times. Since you
say later that
you do not have many messages per topic, you can iterate over the message
and read the ones
that you need. Of course this might not the be the most efficient.


>     # I have an use case where I need to spawn a topic(single partition)
> for each user,
>        so there would be 10k online users at a time, there would be very
> less data per topic.
>        My questions is, what are limitations of having multiple topics(with
> 1 partition), I think
>        this situation would cause heavy memory consumption and are they any
> other limitations?.
>        Basically the problem boils down to what are the scalability
> limitations of having
>        multiple topics(hardware/software)?
>

Partitioning the topic helps scale the writes. If you compare kafka to
other message brokers, the others
might choke when writes happen above a certain rate. By partitioning you
are distributing the writes across
multiple nodes/broker.

You don'nt mention replication but that is relevant as well. It provides
redundancy in that , if the primary broker goes
down, your messages are still available.

regards



-- 
http://khangaonkar.blogspot.com/