You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Andrey Falko <af...@salesforce.com> on 2018/02/02 01:28:33 UTC

Re: Recommended max number of topics (and data separation)

Indeed David, I confirmed that I can't push my clusters to more than
72k topics with default zookeeper settings. Once I get to that
quantity, leader election never happens for new topics. Additionally
if I kill one of the brokers, all the topics don't get leaders
re-elected and it is impossible to trigger it due to the ClientCnxn
"Packet len4194494 is out of range" exception or "Failed to start
preferred replica election" error. It seems like there should be a
couple of JIRA item for starters:
1) Don't let # of topics exceed what jute.maxbuffer will bear; i.e.
send a good error message back to users saying the peak has been
reached and no more topics can be created.
2) Kafka should support more than 72k topics (probably without
requiring messing with jute.maxbuffer).

Is anyone aware of JIRA tickets that might already cover the above?

On Wed, Jan 31, 2018 at 8:35 AM, David Espinosa <es...@gmail.com> wrote:
> I used:
> -Djute.maxbuffer=50111000
> and the gain I had is that I could increment number of topics from 70k to
> 100k :P
>
> 2018-01-30 23:25 GMT+01:00 Andrey Falko <af...@salesforce.com>:
>
>> On Tue, Jan 30, 2018 at 1:38 PM, David Espinosa <es...@gmail.com> wrote:
>> > Hi Andrey,
>> > My topics are replicated with a replicated factor equals to the number of
>> > nodes, 3 in this test.
>> > Didn't know about the kip-227.
>> > The problems I see at 70k topics coming from ZK are related to any
>> > operation where ZK has to retrieve topics metadata. Just listing topics
>> at
>> > 50K or 60k you will experience a big delay in the response. I have no
>> more
>> > details about these problems, but is easy to reproduce the latency in the
>> > topics list request.
>>
>> AFAIK kafka doesn't do a full list as part of normal operations from
>> ZK. If you have requirements in your consumer/producer code on doing
>> --describe, then that would be a problem. I think that can be worked
>> around. Based on my profiling data so far, while things are working in
>> non-failure mode, none of the ZK functions pop up as "hot methods".
>>
>> > Thanks me for pointing me to this parameter,  vm.max_map_count, it wasn't
>> > on my radar. Could you tell me what value you use?
>>
>> I set it to the max allowable on Amzn Linux: vm.max_map_count=1215752192
>>
>> > The other way around about topic naming, I think the longer the topic
>> names
>> > are the sooner jute.maxbuffer overflows.
>>
>> I see; what value(s) have you tried with and how much gain did you you see?
>>
>> > David
>> >
>> >
>> > 2018-01-30 4:40 GMT+01:00 Andrey Falko <af...@salesforce.com>:
>> >
>> >> On Sun, Jan 28, 2018 at 8:45 AM, David Espinosa <es...@gmail.com>
>> wrote:
>> >> > Hi Monty,
>> >> >
>> >> > I'm also planning to use a big amount of topics in Kafka, so recently
>> I
>> >> > made a test within a 3 nodes kafka cluster where I created 100k topics
>> >> with
>> >> > one partition. Sent 1M messages in total.
>> >>
>> >> Are your topic partitions replicated?
>> >>
>> >> > These are my conclusions:
>> >> >
>> >> >    - There is not any limitation on kafka regarding the number of
>> topics
>> >> >    but on Zookeeper and in the system where Kafka nodes is allocated.
>> >>
>> >> There are also the problems being addressed in KIP-227:
>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >> 227%3A+Introduce+Incremental+FetchRequests+to+Increase+
>> >> Partition+Scalability
>> >>
>> >> >    - Zookeeper will start having problems from 70k topics, which can
>> be
>> >> >    solved modifying a buffer parameter on the JVM (-Djute.maxbuffer).
>> >> >    Performance is reduced.
>> >>
>> >> What kind of problems do you see at 70k topics? If performance is
>> >> reduced w/ modifying jute.maxbuffer, won't that effect the performance
>> >> of kafka interms of how long it takes to recover from broker failure,
>> >> creating/deleting topics, producing and consuming?
>> >>
>> >> >    - Open file descriptors of the system are equivalent to [number of
>> >> >    topics]X[number of partitions per topic]. Set to 128k in my test to
>> >> avoid
>> >> >    problems.
>> >> >    - System needs a big amount of memory for page caching.
>> >>
>> >> I also had to tune vm.max_map_count much higher.
>> >>
>> >> >
>> >> > So, after creating 100k with the required setup (system+JVM) but
>> seeing
>> >> > problems at 70k, I feel safe by not creating more than 50k, and always
>> >> will
>> >> > have Zookeeper as my first suspect if a problem comes. I think with
>> >> proper
>> >> > resources (memory) and system setup (open file descriptors), you don't
>> >> have
>> >> > any real limitation regarding partitions.
>> >>
>> >> I can confirm the 50k number. After about 40k-45k topics, I start
>> >> seeing slow down in consume offset commit latencies that eclipse 50ms.
>> >> Hopefully KIP-227 will alleviate that problem and leave ZK as the last
>> >> remaining hurdle. I'm testing with 3x replication per partition and 10
>> >> brokers.
>> >>
>> >> > By the way, I used long topic names (about 30 characters), which can
>> be
>> >> > important for ZK.
>> >>
>> >> I'd like to learn more about this, are you saying that long topic
>> >> names would improve ZK performance because that relates to bumping up
>> >> jute.maxbuffer?
>> >>
>> >> > Hope this information is of your help.
>> >> >
>> >> > David
>> >> >
>> >> > 2018-01-28 2:22 GMT+01:00 Monty Hindman <mo...@gmail.com>:
>> >> >
>> >> >> I'm designing a system and need some more clarity regarding Kafka's
>> >> >> recommended limits on the number of topics and/or partitions. At a
>> high
>> >> >> level, our system would work like this:
>> >> >>
>> >> >> - A user creates a job X (X is a UUID).
>> >> >> - The user uploads data for X to an input topic: X.in.
>> >> >> - Workers process the data, writing results to an output topic:
>> X.out.
>> >> >> - The user downloads the data from X.out.
>> >> >>
>> >> >> It's important for the system that data for different jobs be kept
>> >> >> separate, and that input and output data be kept separate. By
>> >> "separate" I
>> >> >> mean that there needs to be a reasonable way for users and the
>> system's
>> >> >> workers to query for the data they need (by job-id and by
>> >> input-vs-output)
>> >> >> and not get the data they don't need.
>> >> >>
>> >> >> Based on expected usage and our data retention policy, we would not
>> >> expect
>> >> >> to need more than 12,000 active jobs at any one time -- in other
>> words,
>> >> >> 24,000 topics. If we were to have 5 partitions per topic (our cluster
>> >> has 5
>> >> >> brokers), that would imply 120,000 partitions. [These number refer
>> only
>> >> to
>> >> >> main/primary partitions, not any replicas that might exist.]
>> >> >>
>> >> >> Those numbers seem to be far larger than the suggested limits I see
>> >> online.
>> >> >> For example, the Kafka FAQ on these matters seems to imply that the
>> most
>> >> >> relevant limit is the number of partitions (rather than topics) and
>> >> sort of
>> >> >> implies that 10,000 partitions might be a suggested guideline (
>> >> >> https://goo.gl/fQs2md). Also implied is that systems should use
>> fewer
>> >> >> topics and instead partition the data within topics if further
>> >> separation
>> >> >> is needed (the FAQ entry uses the example of partitioning by user ID,
>> >> which
>> >> >> is roughly analogous to job ID in my use case).
>> >> >>
>> >> >> The guidance in the FAQ is unclear to me:
>> >> >>
>> >> >> - Does the suggested limit of 10,000 refer to the total number of
>> >> >> partitions (ie, main partitions plus any replicas) or just the main
>> >> >> partitions?
>> >> >>
>> >> >> - If the most important limitation is number of partitions (rather
>> than
>> >> >> number of topics), how does the suggested strategy of using fewer
>> topics
>> >> >> and then partitioning by some other attribute (ie job ID) help at
>> all?
>> >> >>
>> >> >> - Is my use case just a bad fit for Kafka? Or, is there a way for us
>> to
>> >> use
>> >> >> Kafka while still supporting the kinds of query patterns that we need
>> >> (ie,
>> >> >> by job ID and by input-vs-output)?
>> >> >>
>> >> >> Thanks in advance for any guidance.
>> >> >>
>> >> >> Monty
>> >> >>
>> >>
>>

Re: Recommended max number of topics (and data separation)

Posted by Ted Yu <yu...@gmail.com>.

After brief search, I found KAFKA-6469
FYI
-------- Original message --------From: Andrey Falko <af...@salesforce.com> Date: 2/1/18  5:28 PM  (GMT-08:00) To: users@kafka.apache.org Subject: Re: Recommended max number of topics (and data separation) 
Indeed David, I confirmed that I can't push my clusters to more than
72k topics with default zookeeper settings. Once I get to that
quantity, leader election never happens for new topics. Additionally
if I kill one of the brokers, all the topics don't get leaders
re-elected and it is impossible to trigger it due to the ClientCnxn
"Packet len4194494 is out of range" exception or "Failed to start
preferred replica election" error. It seems like there should be a
couple of JIRA item for starters:
1) Don't let # of topics exceed what jute.maxbuffer will bear; i.e.
send a good error message back to users saying the peak has been
reached and no more topics can be created.
2) Kafka should support more than 72k topics (probably without
requiring messing with jute.maxbuffer).

Is anyone aware of JIRA tickets that might already cover the above?

On Wed, Jan 31, 2018 at 8:35 AM, David Espinosa <es...@gmail.com> wrote:
> I used:
> -Djute.maxbuffer=50111000
> and the gain I had is that I could increment number of topics from 70k to
> 100k :P
>
> 2018-01-30 23:25 GMT+01:00 Andrey Falko <af...@salesforce.com>:
>
>> On Tue, Jan 30, 2018 at 1:38 PM, David Espinosa <es...@gmail.com> wrote:
>> > Hi Andrey,
>> > My topics are replicated with a replicated factor equals to the number of
>> > nodes, 3 in this test.
>> > Didn't know about the kip-227.
>> > The problems I see at 70k topics coming from ZK are related to any
>> > operation where ZK has to retrieve topics metadata. Just listing topics
>> at
>> > 50K or 60k you will experience a big delay in the response. I have no
>> more
>> > details about these problems, but is easy to reproduce the latency in the
>> > topics list request.
>>
>> AFAIK kafka doesn't do a full list as part of normal operations from
>> ZK. If you have requirements in your consumer/producer code on doing
>> --describe, then that would be a problem. I think that can be worked
>> around. Based on my profiling data so far, while things are working in
>> non-failure mode, none of the ZK functions pop up as "hot methods".
>>
>> > Thanks me for pointing me to this parameter,  vm.max_map_count, it wasn't
>> > on my radar. Could you tell me what value you use?
>>
>> I set it to the max allowable on Amzn Linux: vm.max_map_count=1215752192
>>
>> > The other way around about topic naming, I think the longer the topic
>> names
>> > are the sooner jute.maxbuffer overflows.
>>
>> I see; what value(s) have you tried with and how much gain did you you see?
>>
>> > David
>> >
>> >
>> > 2018-01-30 4:40 GMT+01:00 Andrey Falko <af...@salesforce.com>:
>> >
>> >> On Sun, Jan 28, 2018 at 8:45 AM, David Espinosa <es...@gmail.com>
>> wrote:
>> >> > Hi Monty,
>> >> >
>> >> > I'm also planning to use a big amount of topics in Kafka, so recently
>> I
>> >> > made a test within a 3 nodes kafka cluster where I created 100k topics
>> >> with
>> >> > one partition. Sent 1M messages in total.
>> >>
>> >> Are your topic partitions replicated?
>> >>
>> >> > These are my conclusions:
>> >> >
>> >> >    - There is not any limitation on kafka regarding the number of
>> topics
>> >> >    but on Zookeeper and in the system where Kafka nodes is allocated.
>> >>
>> >> There are also the problems being addressed in KIP-227:
>> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>> >> 227%3A+Introduce+Incremental+FetchRequests+to+Increase+
>> >> Partition+Scalability
>> >>
>> >> >    - Zookeeper will start having problems from 70k topics, which can
>> be
>> >> >    solved modifying a buffer parameter on the JVM (-Djute.maxbuffer).
>> >> >    Performance is reduced.
>> >>
>> >> What kind of problems do you see at 70k topics? If performance is
>> >> reduced w/ modifying jute.maxbuffer, won't that effect the performance
>> >> of kafka interms of how long it takes to recover from broker failure,
>> >> creating/deleting topics, producing and consuming?
>> >>
>> >> >    - Open file descriptors of the system are equivalent to [number of
>> >> >    topics]X[number of partitions per topic]. Set to 128k in my test to
>> >> avoid
>> >> >    problems.
>> >> >    - System needs a big amount of memory for page caching.
>> >>
>> >> I also had to tune vm.max_map_count much higher.
>> >>
>> >> >
>> >> > So, after creating 100k with the required setup (system+JVM) but
>> seeing
>> >> > problems at 70k, I feel safe by not creating more than 50k, and always
>> >> will
>> >> > have Zookeeper as my first suspect if a problem comes. I think with
>> >> proper
>> >> > resources (memory) and system setup (open file descriptors), you don't
>> >> have
>> >> > any real limitation regarding partitions.
>> >>
>> >> I can confirm the 50k number. After about 40k-45k topics, I start
>> >> seeing slow down in consume offset commit latencies that eclipse 50ms.
>> >> Hopefully KIP-227 will alleviate that problem and leave ZK as the last
>> >> remaining hurdle. I'm testing with 3x replication per partition and 10
>> >> brokers.
>> >>
>> >> > By the way, I used long topic names (about 30 characters), which can
>> be
>> >> > important for ZK.
>> >>
>> >> I'd like to learn more about this, are you saying that long topic
>> >> names would improve ZK performance because that relates to bumping up
>> >> jute.maxbuffer?
>> >>
>> >> > Hope this information is of your help.
>> >> >
>> >> > David
>> >> >
>> >> > 2018-01-28 2:22 GMT+01:00 Monty Hindman <mo...@gmail.com>:
>> >> >
>> >> >> I'm designing a system and need some more clarity regarding Kafka's
>> >> >> recommended limits on the number of topics and/or partitions. At a
>> high
>> >> >> level, our system would work like this:
>> >> >>
>> >> >> - A user creates a job X (X is a UUID).
>> >> >> - The user uploads data for X to an input topic: X.in.
>> >> >> - Workers process the data, writing results to an output topic:
>> X.out.
>> >> >> - The user downloads the data from X.out.
>> >> >>
>> >> >> It's important for the system that data for different jobs be kept
>> >> >> separate, and that input and output data be kept separate. By
>> >> "separate" I
>> >> >> mean that there needs to be a reasonable way for users and the
>> system's
>> >> >> workers to query for the data they need (by job-id and by
>> >> input-vs-output)
>> >> >> and not get the data they don't need.
>> >> >>
>> >> >> Based on expected usage and our data retention policy, we would not
>> >> expect
>> >> >> to need more than 12,000 active jobs at any one time -- in other
>> words,
>> >> >> 24,000 topics. If we were to have 5 partitions per topic (our cluster
>> >> has 5
>> >> >> brokers), that would imply 120,000 partitions. [These number refer
>> only
>> >> to
>> >> >> main/primary partitions, not any replicas that might exist.]
>> >> >>
>> >> >> Those numbers seem to be far larger than the suggested limits I see
>> >> online.
>> >> >> For example, the Kafka FAQ on these matters seems to imply that the
>> most
>> >> >> relevant limit is the number of partitions (rather than topics) and
>> >> sort of
>> >> >> implies that 10,000 partitions might be a suggested guideline (
>> >> >> https://goo.gl/fQs2md). Also implied is that systems should use
>> fewer
>> >> >> topics and instead partition the data within topics if further
>> >> separation
>> >> >> is needed (the FAQ entry uses the example of partitioning by user ID,
>> >> which
>> >> >> is roughly analogous to job ID in my use case).
>> >> >>
>> >> >> The guidance in the FAQ is unclear to me:
>> >> >>
>> >> >> - Does the suggested limit of 10,000 refer to the total number of
>> >> >> partitions (ie, main partitions plus any replicas) or just the main
>> >> >> partitions?
>> >> >>
>> >> >> - If the most important limitation is number of partitions (rather
>> than
>> >> >> number of topics), how does the suggested strategy of using fewer
>> topics
>> >> >> and then partitioning by some other attribute (ie job ID) help at
>> all?
>> >> >>
>> >> >> - Is my use case just a bad fit for Kafka? Or, is there a way for us
>> to
>> >> use
>> >> >> Kafka while still supporting the kinds of query patterns that we need
>> >> (ie,
>> >> >> by job ID and by input-vs-output)?
>> >> >>
>> >> >> Thanks in advance for any guidance.
>> >> >>
>> >> >> Monty
>> >> >>
>> >>
>>