You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Prabhjot Bharaj <pr...@gmail.com> on 2015/07/28 07:51:27 UTC

Number of kafka topics/partitions supported per cluster of n nodes

Hi,

I'm looking forward to a benchmark which can explain how many total number
of topics and partitions can be created in a cluster of n nodes, given the
message size varies between x and y bytes and how does it vary with varying
heap sizes and how it affects the system performance.

e.g. the result should look like: t topics with p partitions each can be
supported in a cluster of n nodes with a heap size of h MB, before the
cluster sees things like JVM crashes or high mem usage or system slowdown
etc.

I think such benchmarks must exist so that we can make better decisions on
ops side
If these details dont exist, I'll be doing this test myself on varying the
values of parameters described above. I would be happy to share the numbers
with the community

Thanks,
prabcs

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Prabhjot Bharaj <pr...@gmail.com>.
Sure. I would be doing that.
I have seen that if I have 5-7 topics with 256 partitions each on a machine
with 4CPUs, 8GB RAM, the jvm crashes with OutOfMemoryError
And, this happens in many machines in the cluster. (I'll update the exact
number as well)

I was wondering how I could tune the JVM to its limits, for handling such
scenario.

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 12:27 PM, Darion Yaphet <da...@gmail.com>
wrote:

> Kafka store it meta data in Zookeeper Cluster so evaluate "how many total
> number of topics and partitions can be created in a cluster "  maybe same
> as to test Zookeeper's expansibility  and disk IO performance .
>
> 2015-07-28 13:51 GMT+08:00 Prabhjot Bharaj <pr...@gmail.com>:
>
> > Hi,
> >
> > I'm looking forward to a benchmark which can explain how many total
> number
> > of topics and partitions can be created in a cluster of n nodes, given
> the
> > message size varies between x and y bytes and how does it vary with
> varying
> > heap sizes and how it affects the system performance.
> >
> > e.g. the result should look like: t topics with p partitions each can be
> > supported in a cluster of n nodes with a heap size of h MB, before the
> > cluster sees things like JVM crashes or high mem usage or system slowdown
> > etc.
> >
> > I think such benchmarks must exist so that we can make better decisions
> on
> > ops side
> > If these details dont exist, I'll be doing this test myself on varying
> the
> > values of parameters described above. I would be happy to share the
> numbers
> > with the community
> >
> > Thanks,
> > prabcs
> >
>
>
>
> --
>
> long is the way and hard  that out of Hell leads up to light
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Darion Yaphet <da...@gmail.com>.
Kafka store it meta data in Zookeeper Cluster so evaluate "how many total
number of topics and partitions can be created in a cluster "  maybe same
as to test Zookeeper's expansibility  and disk IO performance .

2015-07-28 13:51 GMT+08:00 Prabhjot Bharaj <pr...@gmail.com>:

> Hi,
>
> I'm looking forward to a benchmark which can explain how many total number
> of topics and partitions can be created in a cluster of n nodes, given the
> message size varies between x and y bytes and how does it vary with varying
> heap sizes and how it affects the system performance.
>
> e.g. the result should look like: t topics with p partitions each can be
> supported in a cluster of n nodes with a heap size of h MB, before the
> cluster sees things like JVM crashes or high mem usage or system slowdown
> etc.
>
> I think such benchmarks must exist so that we can make better decisions on
> ops side
> If these details dont exist, I'll be doing this test myself on varying the
> values of parameters described above. I would be happy to share the numbers
> with the community
>
> Thanks,
> prabcs
>



-- 

long is the way and hard  that out of Hell leads up to light

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Prabhjot Bharaj <pr...@gmail.com>.
@Jiefu Gong,

Are the results of your tests available publicly?

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:35 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> I would be using the servers available at my place of work. I dont have
> access to AWS servers. I would starting off with a small number of nodes in
> the cluster and then plot a graph with x-axis as the number of servers in
> the cluster and y-axis as the number of topics with partitions, before the
> cluster gives up.
>
> I need 1 help here: What parameters should I keep in mind for tuning the
> JVM, if I have to see best performance ?
> My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
> Striped)
>
> Regards,
> Prabhjot
>
> On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG <jg...@berkeley.edu> wrote:
>
>> I think these would definitely be useful statistics to have and I've tried
>> to do similar tests! The biggest difference is probably going to be the
>> hardware specs on whatever cluster you decide to run it on. Maybe
>> benchmarks performed on different AWS servers would be helpful too, but
>> I'd
>> like to see these!
>>
>> On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I'm looking forward to a benchmark which can explain how many total
>> number
>> > of topics and partitions can be created in a cluster of n nodes, given
>> the
>> > message size varies between x and y bytes and how does it vary with
>> varying
>> > heap sizes and how it affects the system performance.
>> >
>> > e.g. the result should look like: t topics with p partitions each can be
>> > supported in a cluster of n nodes with a heap size of h MB, before the
>> > cluster sees things like JVM crashes or high mem usage or system
>> slowdown
>> > etc.
>> >
>> > I think such benchmarks must exist so that we can make better decisions
>> on
>> > ops side
>> > If these details dont exist, I'll be doing this test myself on varying
>> the
>> > values of parameters described above. I would be happy to share the
>> numbers
>> > with the community
>> >
>> > Thanks,
>> > prabcs
>> >
>>
>>
>>
>> --
>>
>> Jiefu Gong
>> University of California, Berkeley | Class of 2017
>> B.A Computer Science | College of Letters and Sciences
>>
>> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>>
>
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Prabhjot Bharaj <pr...@gmail.com>.
@Jiefu Gong,

Are the results of your tests available publicly?

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:35 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> I would be using the servers available at my place of work. I dont have
> access to AWS servers. I would starting off with a small number of nodes in
> the cluster and then plot a graph with x-axis as the number of servers in
> the cluster and y-axis as the number of topics with partitions, before the
> cluster gives up.
>
> I need 1 help here: What parameters should I keep in mind for tuning the
> JVM, if I have to see best performance ?
> My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
> Striped)
>
> Regards,
> Prabhjot
>
> On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG <jg...@berkeley.edu> wrote:
>
>> I think these would definitely be useful statistics to have and I've tried
>> to do similar tests! The biggest difference is probably going to be the
>> hardware specs on whatever cluster you decide to run it on. Maybe
>> benchmarks performed on different AWS servers would be helpful too, but
>> I'd
>> like to see these!
>>
>> On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I'm looking forward to a benchmark which can explain how many total
>> number
>> > of topics and partitions can be created in a cluster of n nodes, given
>> the
>> > message size varies between x and y bytes and how does it vary with
>> varying
>> > heap sizes and how it affects the system performance.
>> >
>> > e.g. the result should look like: t topics with p partitions each can be
>> > supported in a cluster of n nodes with a heap size of h MB, before the
>> > cluster sees things like JVM crashes or high mem usage or system
>> slowdown
>> > etc.
>> >
>> > I think such benchmarks must exist so that we can make better decisions
>> on
>> > ops side
>> > If these details dont exist, I'll be doing this test myself on varying
>> the
>> > values of parameters described above. I would be happy to share the
>> numbers
>> > with the community
>> >
>> > Thanks,
>> > prabcs
>> >
>>
>>
>>
>> --
>>
>> Jiefu Gong
>> University of California, Berkeley | Class of 2017
>> B.A Computer Science | College of Letters and Sciences
>>
>> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>>
>
>
>
> --
> ---------------------------------------------------------
> "There are only 10 types of people in the world: Those who understand
> binary, and those who don't"
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Prabhjot Bharaj <pr...@gmail.com>.
I would be using the servers available at my place of work. I dont have
access to AWS servers. I would starting off with a small number of nodes in
the cluster and then plot a graph with x-axis as the number of servers in
the cluster and y-axis as the number of topics with partitions, before the
cluster gives up.

I need 1 help here: What parameters should I keep in mind for tuning the
JVM, if I have to see best performance ?
My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
Striped)

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG <jg...@berkeley.edu> wrote:

> I think these would definitely be useful statistics to have and I've tried
> to do similar tests! The biggest difference is probably going to be the
> hardware specs on whatever cluster you decide to run it on. Maybe
> benchmarks performed on different AWS servers would be helpful too, but I'd
> like to see these!
>
> On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm looking forward to a benchmark which can explain how many total
> number
> > of topics and partitions can be created in a cluster of n nodes, given
> the
> > message size varies between x and y bytes and how does it vary with
> varying
> > heap sizes and how it affects the system performance.
> >
> > e.g. the result should look like: t topics with p partitions each can be
> > supported in a cluster of n nodes with a heap size of h MB, before the
> > cluster sees things like JVM crashes or high mem usage or system slowdown
> > etc.
> >
> > I think such benchmarks must exist so that we can make better decisions
> on
> > ops side
> > If these details dont exist, I'll be doing this test myself on varying
> the
> > values of parameters described above. I would be happy to share the
> numbers
> > with the community
> >
> > Thanks,
> > prabcs
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by Prabhjot Bharaj <pr...@gmail.com>.
I would be using the servers available at my place of work. I dont have
access to AWS servers. I would starting off with a small number of nodes in
the cluster and then plot a graph with x-axis as the number of servers in
the cluster and y-axis as the number of topics with partitions, before the
cluster gives up.

I need 1 help here: What parameters should I keep in mind for tuning the
JVM, if I have to see best performance ?
My machine specs: I have 4 core CPUs with 8GB RAM with 500GB HDD (RAID
Striped)

Regards,
Prabhjot

On Tue, Jul 28, 2015 at 10:27 PM, JIEFU GONG <jg...@berkeley.edu> wrote:

> I think these would definitely be useful statistics to have and I've tried
> to do similar tests! The biggest difference is probably going to be the
> hardware specs on whatever cluster you decide to run it on. Maybe
> benchmarks performed on different AWS servers would be helpful too, but I'd
> like to see these!
>
> On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I'm looking forward to a benchmark which can explain how many total
> number
> > of topics and partitions can be created in a cluster of n nodes, given
> the
> > message size varies between x and y bytes and how does it vary with
> varying
> > heap sizes and how it affects the system performance.
> >
> > e.g. the result should look like: t topics with p partitions each can be
> > supported in a cluster of n nodes with a heap size of h MB, before the
> > cluster sees things like JVM crashes or high mem usage or system slowdown
> > etc.
> >
> > I think such benchmarks must exist so that we can make better decisions
> on
> > ops side
> > If these details dont exist, I'll be doing this test myself on varying
> the
> > values of parameters described above. I would be happy to share the
> numbers
> > with the community
> >
> > Thanks,
> > prabcs
> >
>
>
>
> --
>
> Jiefu Gong
> University of California, Berkeley | Class of 2017
> B.A Computer Science | College of Letters and Sciences
>
> jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427
>



-- 
---------------------------------------------------------
"There are only 10 types of people in the world: Those who understand
binary, and those who don't"

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by JIEFU GONG <jg...@berkeley.edu>.
I think these would definitely be useful statistics to have and I've tried
to do similar tests! The biggest difference is probably going to be the
hardware specs on whatever cluster you decide to run it on. Maybe
benchmarks performed on different AWS servers would be helpful too, but I'd
like to see these!

On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> I'm looking forward to a benchmark which can explain how many total number
> of topics and partitions can be created in a cluster of n nodes, given the
> message size varies between x and y bytes and how does it vary with varying
> heap sizes and how it affects the system performance.
>
> e.g. the result should look like: t topics with p partitions each can be
> supported in a cluster of n nodes with a heap size of h MB, before the
> cluster sees things like JVM crashes or high mem usage or system slowdown
> etc.
>
> I think such benchmarks must exist so that we can make better decisions on
> ops side
> If these details dont exist, I'll be doing this test myself on varying the
> values of parameters described above. I would be happy to share the numbers
> with the community
>
> Thanks,
> prabcs
>



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427

Re: Number of kafka topics/partitions supported per cluster of n nodes

Posted by JIEFU GONG <jg...@berkeley.edu>.
I think these would definitely be useful statistics to have and I've tried
to do similar tests! The biggest difference is probably going to be the
hardware specs on whatever cluster you decide to run it on. Maybe
benchmarks performed on different AWS servers would be helpful too, but I'd
like to see these!

On Mon, Jul 27, 2015 at 10:51 PM, Prabhjot Bharaj <pr...@gmail.com>
wrote:

> Hi,
>
> I'm looking forward to a benchmark which can explain how many total number
> of topics and partitions can be created in a cluster of n nodes, given the
> message size varies between x and y bytes and how does it vary with varying
> heap sizes and how it affects the system performance.
>
> e.g. the result should look like: t topics with p partitions each can be
> supported in a cluster of n nodes with a heap size of h MB, before the
> cluster sees things like JVM crashes or high mem usage or system slowdown
> etc.
>
> I think such benchmarks must exist so that we can make better decisions on
> ops side
> If these details dont exist, I'll be doing this test myself on varying the
> values of parameters described above. I would be happy to share the numbers
> with the community
>
> Thanks,
> prabcs
>



-- 

Jiefu Gong
University of California, Berkeley | Class of 2017
B.A Computer Science | College of Letters and Sciences

jgong@berkeley.edu <el...@berkeley.edu> | (925) 400-3427