You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Achanta Vamsi Subhash <ac...@flipkart.com> on 2014/12/19 15:10:21 UTC

Max. storage for Kafka and impact

Hi,

We are using Kafka for our messaging system and we have an estimate for 200
TB/week in the coming months. Will it impact any performance for Kafka?

PS: We will be having greater than 2 lakh partitions.

-- 
Regards
Vamsi Subhash

Re: Max. storage for Kafka and impact

Posted by Pradeep Gollakota <pr...@gmail.com>.
@Joe, Achanta is using Indian English numerals which is why it's a little
confusing. http://en.wikipedia.org/wiki/Indian_English#Numbering_system
1,00,000 [1 lakh] (Indian English) == 100,000 [1 hundred thousand] (The
rest of the world :P)

On Fri Dec 19 2014 at 9:40:29 AM Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:

> Joe,
>
> - Correction, it's 1,00,000 partitions
> - We can have at max only 1 consumer/partition. Not 50 per 1 partition.
> Yes, we have a hashing mechanism to support future partition increase as
> well. We override the Default Partitioner.
> - We use both Simple and HighLevel consumers depending on the consumption
> use-case.
> - I clearly mentioned that 200 TB/week and not a day.
> - We have separate producers and consumers, each operating as different
> processes in different machines.
>
> I was explaining why we may end up with so many partitions. I think the
> question about 200 TB/day got deviated.
>
> Any suggestions reg. the performance impact of the 200TB/week?
>
> On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
> > partitions? I think you can take what I said below and change my 250 to
> 25
> > as I went with your result (1,000,000) and not your arguments (2,000 x
> 50).
> >
> > And you should think on the processing as a separate step from fetch and
> > commit your offset in batch post processing. Then you only need more
> > partitions to fetch batches to process in parallel.
> >
> > Regards, Joestein
> >
> > On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <jo...@stealth.ly>
> wrote:
> > >
> > > see some comments inline
> > >
> > > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> > > achanta.vamsi@flipkart.com> wrote:
> > >>
> > >> We require:
> > >> - many topics
> > >> - ordering of messages for every topic
> > >>
> > >
> > > Ordering is only on a per partition basis so you might have to pick a
> > > partition key that makes sense for what you are doing.
> > >
> > >
> > >> - Consumers hit different Http EndPoints which may be slow (in a push
> > >> model). In case of a Pull model, consumers may pull at the rate at
> which
> > >> they can process.
> > >> - We need parallelism to hit with as many consumers. Hence, we
> currently
> > >> have around 50 consumers/topic => 50 partitions.
> > >>
> > >
> > > I think you might be mixing up the fetch with the processing. You can
> > have
> > > 1 partition and still have 50 message being processed in parallel (so a
> > > batch of messages).
> > >
> > > What language are you working in? How are you doing this processing
> > > exactly?
> > >
> > >
> > >>
> > >> Currently we have:
> > >> 2000 topics x 50 => 1,00,000 partitions.
> > >>
> > >
> > > If this is really the case then you are going to need at least 250
> > brokers
> > > (~ 4,000 partitions per broker).
> > >
> > > If you do that then you are in the 200TB per day world which doesn't
> > sound
> > > to be the case.
> > >
> > > I really think you need to strategize more on your processing model
> some
> > > more.
> > >
> > >
> > >>
> > >> The incoming rate of ingestion at max is 100 MB/sec. We are planning
> > for a
> > >> big cluster with many brokers.
> > >
> > >
> > > It is possible to handle this on just 3 brokers depending on message
> > size,
> > > ability to batch, durability are also factors you really need to be
> > > thinking about.
> > >
> > >
> > >>
> > >> We have exactly the same use cases as mentioned in this video (usage
> at
> > >> LinkedIn):
> > >> https://www.youtube.com/watch?v=19DvtEC0EbQ​
> > >>
> > >> ​To handle the zookeeper scenario, as mentioned in the above video, we
> > are
> > >> planning to use SSDs​ and would upgrade to the new consumer (0.9+)
> once
> > >> its
> > >> available as per the below video.
> > >> https://www.youtube.com/watch?v=7TZiN521FQA
> > >>
> > >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> > >> <j_thakrar@yahoo.com.invalid
> > >> > wrote:
> > >>
> > >> > Technically/conceptually it is possible to have 200,000 topics, but
> do
> > >> you
> > >> > really need it like that?What do you intend to do with those
> messages
> > -
> > >> > i.e. how do you forsee them being processed downstream? And are
> those
> > >> > topics really there to segregate different kinds of processing or
> > >> different
> > >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have
> have
> > >> one
> > >> > topic per user or one topic per kind of event (e.g. login, pageview,
> > >> > adview, etc.)Remember there is significant book-keeping done within
> > >> > Zookeeper - and these many topics will make that book-keeping
> > >> significant.
> > >> > As for storage, I don't think it should be an issue with sufficient
> > >> > spindles, servers and higher than default memory configuration.
> > >> > Jayesh
> > >> >       From: Achanta Vamsi Subhash <ac...@flipkart.com>
> > >> >  To: "users@kafka.apache.org" <us...@kafka.apache.org>
> > >> >  Sent: Friday, December 19, 2014 9:00 AM
> > >> >  Subject: Re: Max. storage for Kafka and impact
> > >> >
> > >> > Yes. We need those many max partitions as we have a central
> messaging
> > >> > service and thousands of topics.
> > >> >
> > >> > On Friday, December 19, 2014, nitin sharma <
> > kumarsharma.nitin@gmail.com
> > >> >
> > >> > wrote:
> > >> >
> > >> > > hi,
> > >> > >
> > >> > > Few things you have to plan for:
> > >> > > a. Ensure that from resilience point of view, you are having
> > >> sufficient
> > >> > > follower brokers for your partitions.
> > >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much
> > issue
> > >> > with
> > >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > >> > > c. 200,000 partitions means around 1MB/week/partition. are you
> sure
> > >> you
> > >> > > need so many partitions?
> > >> > >
> > >> > > Regards,
> > >> > > Nitin Kumar Sharma.
> > >> > >
> > >> > >
> > >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > >> > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > >> > > >
> > >> > > > We definitely need a retention policy of a week. Hence.
> > >> > > >
> > >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > >> > > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > >> > > > >
> > >> > > > > Hi,
> > >> > > > >
> > >> > > > > We are using Kafka for our messaging system and we have an
> > >> estimate
> > >> > for
> > >> > > > > 200 TB/week in the coming months. Will it impact any
> performance
> > >> for
> > >> > > > Kafka?
> > >> > > > >
> > >> > > > > PS: We will be having greater than 2 lakh partitions.
> > >> >
> > >> >
> > >> > > > >
> > >> > > > > --
> > >> > > > > Regards
> > >> > > > > Vamsi Subhash
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > Regards
> > >> > > > Vamsi Subhash
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Regards
> > >> > Vamsi Subhash
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >>
> > >>
> > >> --
> > >> Regards
> > >> Vamsi Subhash
> > >>
> > >
> >
>
>
> --
> Regards
> Vamsi Subhash
>

Re: Max. storage for Kafka and impact

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
Joe,

- Correction, it's 1,00,000 partitions
- We can have at max only 1 consumer/partition. Not 50 per 1 partition.
Yes, we have a hashing mechanism to support future partition increase as
well. We override the Default Partitioner.
- We use both Simple and HighLevel consumers depending on the consumption
use-case.
- I clearly mentioned that 200 TB/week and not a day.
- We have separate producers and consumers, each operating as different
processes in different machines.

I was explaining why we may end up with so many partitions. I think the
question about 200 TB/day got deviated.

Any suggestions reg. the performance impact of the 200TB/week?

On Fri, Dec 19, 2014 at 10:53 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
> partitions? I think you can take what I said below and change my 250 to 25
> as I went with your result (1,000,000) and not your arguments (2,000 x 50).
>
> And you should think on the processing as a separate step from fetch and
> commit your offset in batch post processing. Then you only need more
> partitions to fetch batches to process in parallel.
>
> Regards, Joestein
>
> On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <jo...@stealth.ly> wrote:
> >
> > see some comments inline
> >
> > On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> > achanta.vamsi@flipkart.com> wrote:
> >>
> >> We require:
> >> - many topics
> >> - ordering of messages for every topic
> >>
> >
> > Ordering is only on a per partition basis so you might have to pick a
> > partition key that makes sense for what you are doing.
> >
> >
> >> - Consumers hit different Http EndPoints which may be slow (in a push
> >> model). In case of a Pull model, consumers may pull at the rate at which
> >> they can process.
> >> - We need parallelism to hit with as many consumers. Hence, we currently
> >> have around 50 consumers/topic => 50 partitions.
> >>
> >
> > I think you might be mixing up the fetch with the processing. You can
> have
> > 1 partition and still have 50 message being processed in parallel (so a
> > batch of messages).
> >
> > What language are you working in? How are you doing this processing
> > exactly?
> >
> >
> >>
> >> Currently we have:
> >> 2000 topics x 50 => 1,00,000 partitions.
> >>
> >
> > If this is really the case then you are going to need at least 250
> brokers
> > (~ 4,000 partitions per broker).
> >
> > If you do that then you are in the 200TB per day world which doesn't
> sound
> > to be the case.
> >
> > I really think you need to strategize more on your processing model some
> > more.
> >
> >
> >>
> >> The incoming rate of ingestion at max is 100 MB/sec. We are planning
> for a
> >> big cluster with many brokers.
> >
> >
> > It is possible to handle this on just 3 brokers depending on message
> size,
> > ability to batch, durability are also factors you really need to be
> > thinking about.
> >
> >
> >>
> >> We have exactly the same use cases as mentioned in this video (usage at
> >> LinkedIn):
> >> https://www.youtube.com/watch?v=19DvtEC0EbQ​
> >>
> >> ​To handle the zookeeper scenario, as mentioned in the above video, we
> are
> >> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once
> >> its
> >> available as per the below video.
> >> https://www.youtube.com/watch?v=7TZiN521FQA
> >>
> >> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> >> <j_thakrar@yahoo.com.invalid
> >> > wrote:
> >>
> >> > Technically/conceptually it is possible to have 200,000 topics, but do
> >> you
> >> > really need it like that?What do you intend to do with those messages
> -
> >> > i.e. how do you forsee them being processed downstream? And are those
> >> > topics really there to segregate different kinds of processing or
> >> different
> >> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
> >> one
> >> > topic per user or one topic per kind of event (e.g. login, pageview,
> >> > adview, etc.)Remember there is significant book-keeping done within
> >> > Zookeeper - and these many topics will make that book-keeping
> >> significant.
> >> > As for storage, I don't think it should be an issue with sufficient
> >> > spindles, servers and higher than default memory configuration.
> >> > Jayesh
> >> >       From: Achanta Vamsi Subhash <ac...@flipkart.com>
> >> >  To: "users@kafka.apache.org" <us...@kafka.apache.org>
> >> >  Sent: Friday, December 19, 2014 9:00 AM
> >> >  Subject: Re: Max. storage for Kafka and impact
> >> >
> >> > Yes. We need those many max partitions as we have a central messaging
> >> > service and thousands of topics.
> >> >
> >> > On Friday, December 19, 2014, nitin sharma <
> kumarsharma.nitin@gmail.com
> >> >
> >> > wrote:
> >> >
> >> > > hi,
> >> > >
> >> > > Few things you have to plan for:
> >> > > a. Ensure that from resilience point of view, you are having
> >> sufficient
> >> > > follower brokers for your partitions.
> >> > > b. In my testing of kafka (50TB/week) so far, haven't seen much
> issue
> >> > with
> >> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> >> > > c. 200,000 partitions means around 1MB/week/partition. are you sure
> >> you
> >> > > need so many partitions?
> >> > >
> >> > > Regards,
> >> > > Nitin Kumar Sharma.
> >> > >
> >> > >
> >> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> >> > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> >> > > >
> >> > > > We definitely need a retention policy of a week. Hence.
> >> > > >
> >> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> >> > > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> >> > > > >
> >> > > > > Hi,
> >> > > > >
> >> > > > > We are using Kafka for our messaging system and we have an
> >> estimate
> >> > for
> >> > > > > 200 TB/week in the coming months. Will it impact any performance
> >> for
> >> > > > Kafka?
> >> > > > >
> >> > > > > PS: We will be having greater than 2 lakh partitions.
> >> >
> >> >
> >> > > > >
> >> > > > > --
> >> > > > > Regards
> >> > > > > Vamsi Subhash
> >> > > > >
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Regards
> >> > > > Vamsi Subhash
> >> > > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Regards
> >> > Vamsi Subhash
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> Regards
> >> Vamsi Subhash
> >>
> >
>


-- 
Regards
Vamsi Subhash

Re: Max. storage for Kafka and impact

Posted by Joe Stein <jo...@stealth.ly>.
Wait, how do you get 2,000 topics each with 50 partitions == 1,000,000
partitions? I think you can take what I said below and change my 250 to 25
as I went with your result (1,000,000) and not your arguments (2,000 x 50).

And you should think on the processing as a separate step from fetch and
commit your offset in batch post processing. Then you only need more
partitions to fetch batches to process in parallel.

Regards, Joestein

On Fri, Dec 19, 2014 at 12:01 PM, Joe Stein <jo...@stealth.ly> wrote:
>
> see some comments inline
>
> On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com> wrote:
>>
>> We require:
>> - many topics
>> - ordering of messages for every topic
>>
>
> Ordering is only on a per partition basis so you might have to pick a
> partition key that makes sense for what you are doing.
>
>
>> - Consumers hit different Http EndPoints which may be slow (in a push
>> model). In case of a Pull model, consumers may pull at the rate at which
>> they can process.
>> - We need parallelism to hit with as many consumers. Hence, we currently
>> have around 50 consumers/topic => 50 partitions.
>>
>
> I think you might be mixing up the fetch with the processing. You can have
> 1 partition and still have 50 message being processed in parallel (so a
> batch of messages).
>
> What language are you working in? How are you doing this processing
> exactly?
>
>
>>
>> Currently we have:
>> 2000 topics x 50 => 1,00,000 partitions.
>>
>
> If this is really the case then you are going to need at least 250 brokers
> (~ 4,000 partitions per broker).
>
> If you do that then you are in the 200TB per day world which doesn't sound
> to be the case.
>
> I really think you need to strategize more on your processing model some
> more.
>
>
>>
>> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
>> big cluster with many brokers.
>
>
> It is possible to handle this on just 3 brokers depending on message size,
> ability to batch, durability are also factors you really need to be
> thinking about.
>
>
>>
>> We have exactly the same use cases as mentioned in this video (usage at
>> LinkedIn):
>> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>>
>> ​To handle the zookeeper scenario, as mentioned in the above video, we are
>> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once
>> its
>> available as per the below video.
>> https://www.youtube.com/watch?v=7TZiN521FQA
>>
>> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
>> <j_thakrar@yahoo.com.invalid
>> > wrote:
>>
>> > Technically/conceptually it is possible to have 200,000 topics, but do
>> you
>> > really need it like that?What do you intend to do with those messages -
>> > i.e. how do you forsee them being processed downstream? And are those
>> > topics really there to segregate different kinds of processing or
>> different
>> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
>> one
>> > topic per user or one topic per kind of event (e.g. login, pageview,
>> > adview, etc.)Remember there is significant book-keeping done within
>> > Zookeeper - and these many topics will make that book-keeping
>> significant.
>> > As for storage, I don't think it should be an issue with sufficient
>> > spindles, servers and higher than default memory configuration.
>> > Jayesh
>> >       From: Achanta Vamsi Subhash <ac...@flipkart.com>
>> >  To: "users@kafka.apache.org" <us...@kafka.apache.org>
>> >  Sent: Friday, December 19, 2014 9:00 AM
>> >  Subject: Re: Max. storage for Kafka and impact
>> >
>> > Yes. We need those many max partitions as we have a central messaging
>> > service and thousands of topics.
>> >
>> > On Friday, December 19, 2014, nitin sharma <kumarsharma.nitin@gmail.com
>> >
>> > wrote:
>> >
>> > > hi,
>> > >
>> > > Few things you have to plan for:
>> > > a. Ensure that from resilience point of view, you are having
>> sufficient
>> > > follower brokers for your partitions.
>> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
>> > with
>> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
>> > > c. 200,000 partitions means around 1MB/week/partition. are you sure
>> you
>> > > need so many partitions?
>> > >
>> > > Regards,
>> > > Nitin Kumar Sharma.
>> > >
>> > >
>> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
>> > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
>> > > >
>> > > > We definitely need a retention policy of a week. Hence.
>> > > >
>> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
>> > > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
>> > > > >
>> > > > > Hi,
>> > > > >
>> > > > > We are using Kafka for our messaging system and we have an
>> estimate
>> > for
>> > > > > 200 TB/week in the coming months. Will it impact any performance
>> for
>> > > > Kafka?
>> > > > >
>> > > > > PS: We will be having greater than 2 lakh partitions.
>> >
>> >
>> > > > >
>> > > > > --
>> > > > > Regards
>> > > > > Vamsi Subhash
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > Regards
>> > > > Vamsi Subhash
>> > > >
>> > >
>> >
>> >
>> > --
>> > Regards
>> > Vamsi Subhash
>> >
>> >
>> >
>> >
>>
>>
>>
>> --
>> Regards
>> Vamsi Subhash
>>
>

Re: Max. storage for Kafka and impact

Posted by Joe Stein <jo...@stealth.ly>.
see some comments inline

On Fri, Dec 19, 2014 at 11:30 AM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:
>
> We require:
> - many topics
> - ordering of messages for every topic
>

Ordering is only on a per partition basis so you might have to pick a
partition key that makes sense for what you are doing.


> - Consumers hit different Http EndPoints which may be slow (in a push
> model). In case of a Pull model, consumers may pull at the rate at which
> they can process.
> - We need parallelism to hit with as many consumers. Hence, we currently
> have around 50 consumers/topic => 50 partitions.
>

I think you might be mixing up the fetch with the processing. You can have
1 partition and still have 50 message being processed in parallel (so a
batch of messages).

What language are you working in? How are you doing this processing
exactly?


>
> Currently we have:
> 2000 topics x 50 => 1,00,000 partitions.
>

If this is really the case then you are going to need at least 250 brokers
(~ 4,000 partitions per broker).

If you do that then you are in the 200TB per day world which doesn't sound
to be the case.

I really think you need to strategize more on your processing model some
more.


>
> The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
> big cluster with many brokers.


It is possible to handle this on just 3 brokers depending on message size,
ability to batch, durability are also factors you really need to be
thinking about.


>
> We have exactly the same use cases as mentioned in this video (usage at
> LinkedIn):
> https://www.youtube.com/watch?v=19DvtEC0EbQ​
>
> ​To handle the zookeeper scenario, as mentioned in the above video, we are
> planning to use SSDs​ and would upgrade to the new consumer (0.9+) once its
> available as per the below video.
> https://www.youtube.com/watch?v=7TZiN521FQA
>
> On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar
> <j_thakrar@yahoo.com.invalid
> > wrote:
>
> > Technically/conceptually it is possible to have 200,000 topics, but do
> you
> > really need it like that?What do you intend to do with those messages -
> > i.e. how do you forsee them being processed downstream? And are those
> > topics really there to segregate different kinds of processing or
> different
> > ids?E.g. if you were LinkedIn, Facebook or Google, would you have have
> one
> > topic per user or one topic per kind of event (e.g. login, pageview,
> > adview, etc.)Remember there is significant book-keeping done within
> > Zookeeper - and these many topics will make that book-keeping
> significant.
> > As for storage, I don't think it should be an issue with sufficient
> > spindles, servers and higher than default memory configuration.
> > Jayesh
> >       From: Achanta Vamsi Subhash <ac...@flipkart.com>
> >  To: "users@kafka.apache.org" <us...@kafka.apache.org>
> >  Sent: Friday, December 19, 2014 9:00 AM
> >  Subject: Re: Max. storage for Kafka and impact
> >
> > Yes. We need those many max partitions as we have a central messaging
> > service and thousands of topics.
> >
> > On Friday, December 19, 2014, nitin sharma <ku...@gmail.com>
> > wrote:
> >
> > > hi,
> > >
> > > Few things you have to plan for:
> > > a. Ensure that from resilience point of view, you are having sufficient
> > > follower brokers for your partitions.
> > > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
> > with
> > > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > > c. 200,000 partitions means around 1MB/week/partition. are you sure you
> > > need so many partitions?
> > >
> > > Regards,
> > > Nitin Kumar Sharma.
> > >
> > >
> > > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > > >
> > > > We definitely need a retention policy of a week. Hence.
> > > >
> > > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > We are using Kafka for our messaging system and we have an estimate
> > for
> > > > > 200 TB/week in the coming months. Will it impact any performance
> for
> > > > Kafka?
> > > > >
> > > > > PS: We will be having greater than 2 lakh partitions.
> >
> >
> > > > >
> > > > > --
> > > > > Regards
> > > > > Vamsi Subhash
> > > > >
> > > >
> > > >
> > > > --
> > > > Regards
> > > > Vamsi Subhash
> > > >
> > >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
> >
> >
> >
>
>
>
> --
> Regards
> Vamsi Subhash
>

Re: Max. storage for Kafka and impact

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
We require:
- many topics
- ordering of messages for every topic
- Consumers hit different Http EndPoints which may be slow (in a push
model). In case of a Pull model, consumers may pull at the rate at which
they can process.
- We need parallelism to hit with as many consumers. Hence, we currently
have around 50 consumers/topic => 50 partitions.

Currently we have:
2000 topics x 50 => 1,00,000 partitions.

The incoming rate of ingestion at max is 100 MB/sec. We are planning for a
big cluster with many brokers.

We have exactly the same use cases as mentioned in this video (usage at
LinkedIn):
https://www.youtube.com/watch?v=19DvtEC0EbQ​

​To handle the zookeeper scenario, as mentioned in the above video, we are
planning to use SSDs​ and would upgrade to the new consumer (0.9+) once its
available as per the below video.
https://www.youtube.com/watch?v=7TZiN521FQA

On Fri, Dec 19, 2014 at 9:06 PM, Jayesh Thakrar <j_thakrar@yahoo.com.invalid
> wrote:

> Technically/conceptually it is possible to have 200,000 topics, but do you
> really need it like that?What do you intend to do with those messages -
> i.e. how do you forsee them being processed downstream? And are those
> topics really there to segregate different kinds of processing or different
> ids?E.g. if you were LinkedIn, Facebook or Google, would you have have one
> topic per user or one topic per kind of event (e.g. login, pageview,
> adview, etc.)Remember there is significant book-keeping done within
> Zookeeper - and these many topics will make that book-keeping significant.
> As for storage, I don't think it should be an issue with sufficient
> spindles, servers and higher than default memory configuration.
> Jayesh
>       From: Achanta Vamsi Subhash <ac...@flipkart.com>
>  To: "users@kafka.apache.org" <us...@kafka.apache.org>
>  Sent: Friday, December 19, 2014 9:00 AM
>  Subject: Re: Max. storage for Kafka and impact
>
> Yes. We need those many max partitions as we have a central messaging
> service and thousands of topics.
>
> On Friday, December 19, 2014, nitin sharma <ku...@gmail.com>
> wrote:
>
> > hi,
> >
> > Few things you have to plan for:
> > a. Ensure that from resilience point of view, you are having sufficient
> > follower brokers for your partitions.
> > b. In my testing of kafka (50TB/week) so far, haven't seen much issue
> with
> > CPU utilization or memory. I had 24 CPU and 32GB RAM.
> > c. 200,000 partitions means around 1MB/week/partition. are you sure you
> > need so many partitions?
> >
> > Regards,
> > Nitin Kumar Sharma.
> >
> >
> > On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > >
> > > We definitely need a retention policy of a week. Hence.
> > >
> > > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > > >
> > > > Hi,
> > > >
> > > > We are using Kafka for our messaging system and we have an estimate
> for
> > > > 200 TB/week in the coming months. Will it impact any performance for
> > > Kafka?
> > > >
> > > > PS: We will be having greater than 2 lakh partitions.
>
>
> > > >
> > > > --
> > > > Regards
> > > > Vamsi Subhash
> > > >
> > >
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
>
>
> --
> Regards
> Vamsi Subhash
>
>
>
>



-- 
Regards
Vamsi Subhash

Re: Max. storage for Kafka and impact

Posted by Jayesh Thakrar <j_...@yahoo.com.INVALID>.
Technically/conceptually it is possible to have 200,000 topics, but do you really need it like that?What do you intend to do with those messages - i.e. how do you forsee them being processed downstream? And are those topics really there to segregate different kinds of processing or different ids?E.g. if you were LinkedIn, Facebook or Google, would you have have one topic per user or one topic per kind of event (e.g. login, pageview, adview, etc.)Remember there is significant book-keeping done within Zookeeper - and these many topics will make that book-keeping significant.
As for storage, I don't think it should be an issue with sufficient spindles, servers and higher than default memory configuration.
Jayesh
      From: Achanta Vamsi Subhash <ac...@flipkart.com>
 To: "users@kafka.apache.org" <us...@kafka.apache.org> 
 Sent: Friday, December 19, 2014 9:00 AM
 Subject: Re: Max. storage for Kafka and impact
   
Yes. We need those many max partitions as we have a central messaging
service and thousands of topics.

On Friday, December 19, 2014, nitin sharma <ku...@gmail.com>
wrote:

> hi,
>
> Few things you have to plan for:
> a. Ensure that from resilience point of view, you are having sufficient
> follower brokers for your partitions.
> b. In my testing of kafka (50TB/week) so far, haven't seen much issue with
> CPU utilization or memory. I had 24 CPU and 32GB RAM.
> c. 200,000 partitions means around 1MB/week/partition. are you sure you
> need so many partitions?
>
> Regards,
> Nitin Kumar Sharma.
>
>
> On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com <javascript:;>> wrote:
> >
> > We definitely need a retention policy of a week. Hence.
> >
> > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > >
> > > Hi,
> > >
> > > We are using Kafka for our messaging system and we have an estimate for
> > > 200 TB/week in the coming months. Will it impact any performance for
> > Kafka?
> > >
> > > PS: We will be having greater than 2 lakh partitions.


> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>


-- 
Regards
Vamsi Subhash


  

Re: Max. storage for Kafka and impact

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
Yes. We need those many max partitions as we have a central messaging
service and thousands of topics.

On Friday, December 19, 2014, nitin sharma <ku...@gmail.com>
wrote:

> hi,
>
> Few things you have to plan for:
> a. Ensure that from resilience point of view, you are having sufficient
> follower brokers for your partitions.
> b. In my testing of kafka (50TB/week) so far, haven't seen much issue with
> CPU utilization or memory. I had 24 CPU and 32GB RAM.
> c. 200,000 partitions means around 1MB/week/partition. are you sure you
> need so many partitions?
>
> Regards,
> Nitin Kumar Sharma.
>
>
> On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com <javascript:;>> wrote:
> >
> > We definitely need a retention policy of a week. Hence.
> >
> > On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> > achanta.vamsi@flipkart.com <javascript:;>> wrote:
> > >
> > > Hi,
> > >
> > > We are using Kafka for our messaging system and we have an estimate for
> > > 200 TB/week in the coming months. Will it impact any performance for
> > Kafka?
> > >
> > > PS: We will be having greater than 2 lakh partitions.
> > >
> > > --
> > > Regards
> > > Vamsi Subhash
> > >
> >
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>


-- 
Regards
Vamsi Subhash

Re: Max. storage for Kafka and impact

Posted by nitin sharma <ku...@gmail.com>.
hi,

Few things you have to plan for:
a. Ensure that from resilience point of view, you are having sufficient
follower brokers for your partitions.
b. In my testing of kafka (50TB/week) so far, haven't seen much issue with
CPU utilization or memory. I had 24 CPU and 32GB RAM.
c. 200,000 partitions means around 1MB/week/partition. are you sure you
need so many partitions?

Regards,
Nitin Kumar Sharma.


On Fri, Dec 19, 2014 at 9:12 AM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:
>
> We definitely need a retention policy of a week. Hence.
>
> On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
> achanta.vamsi@flipkart.com> wrote:
> >
> > Hi,
> >
> > We are using Kafka for our messaging system and we have an estimate for
> > 200 TB/week in the coming months. Will it impact any performance for
> Kafka?
> >
> > PS: We will be having greater than 2 lakh partitions.
> >
> > --
> > Regards
> > Vamsi Subhash
> >
>
>
> --
> Regards
> Vamsi Subhash
>

Re: Max. storage for Kafka and impact

Posted by Achanta Vamsi Subhash <ac...@flipkart.com>.
We definitely need a retention policy of a week. Hence.

On Fri, Dec 19, 2014 at 7:40 PM, Achanta Vamsi Subhash <
achanta.vamsi@flipkart.com> wrote:
>
> Hi,
>
> We are using Kafka for our messaging system and we have an estimate for
> 200 TB/week in the coming months. Will it impact any performance for Kafka?
>
> PS: We will be having greater than 2 lakh partitions.
>
> --
> Regards
> Vamsi Subhash
>


-- 
Regards
Vamsi Subhash