You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by S Ahmed <sa...@gmail.com> on 2013/10/11 00:14:01 UTC

Anyone running kafka with a single broker in production? what about only 8GB ram?

Is anyone out there running a single broker kafka setup?

How about with only 8 GB RAM?

I'm looking at one of the better dedicated server prodivers, and a 8GB
server is pretty much what I want to spend at the moment, would it make
sense going this route?
This same server would also potentially be running zookeeper also.

In terms of messages per second, at most I would be seeing about 2000
messages per second, of 20KB to 200KB in size.

I know the people at linkedin are running with I believe 24GB of ram.

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by Kane Kane <ka...@gmail.com>.

I'm also curious to know what is the limiting factor of kafka write
throughput?

I've never seen reports higher than 100mb/sec, obviously disks can provide
much more. In my own test with single broker, single partition, single
replica:
bin/kafka-producer-perf-test.sh --topics perf --threads 10 --broker-list
10.80.42.154:9092 --messages 5000000 --message-size 3000
It tops around 90MB/sec. Cpu, disk, memory, network, everything is
chilling, but still I can't get higher numbers.


On Fri, Oct 11, 2013 at 11:17 AM, Bruno D. Rodrigues <
bruno.rodrigues@litux.org> wrote:

> Producer:
>         props.put("batch.num.messages", "1000"); // 200
>         props.put("queue.buffering.max.messages", "20000"); // 10000
>         props.put("request.required.acks", "0");
>         props.put("producer.type", "async"); // sync
>
>         // return ++this.count % a_numPartitions; // just round-robin
>         props.put("partitioner.class", "main.SimplePartitioner"); //
> kafka.producer.DefaultPartitioner
>
>         // disabled = 70MB source, 70MB network, enabled = 70MB source,
> ~40-50MB network
>         props.put("compression.codec", "Snappy"); // none
>
> Consumer is with default settings, as I test separately without any
> consumer at all, and then test the extra load of having 1..n consumers. I
> assume the top speed would be without consumers at all. I'm measuring both
> the produced messages as well as the consumer side.
>
> On the kafka server I've changed the following, expecting less disk writes
> at the cost of loosing messages:
>
> #log.flush.interval.messages=10000
> log.flush.interval.messages=10000000
> #log.flush.interval.ms=1000
> log.flush.interval.ms=10000
> #log.segment.bytes=536870912
> # is signed int 32, only up to 2^31-1!
> log.segment.bytes=2000000000
> #log.retention.hours=168
> log.retention.hours=1
>
>
> Basically I need high throughput of discardable messages, so having them
> persisted temporarily on the disk, in an highly optimised manner like Kafka
> shows, would be great not for the reliability (not loosing messages), but
> because it would allow me to get some previous messages even if the client
> (kafka client or real consumer client) disconnects, as well as providing a
> way to go back in time some seconds if needed.
>
>
>
> A 11/10/2013, às 18:56, Magnus Edenhill <ma...@edenhill.se> escreveu:
>
> Make sure the fetch batch size and the local consumer queue sizes are large
> enough, setting them too low will limit your throughput to the
> broker<->client latency.
>
> This would be controlled using the following properties:
> - fetch.message.max.bytes
> - queued.max.message.chunks
>
> On the producer side you would want to play with:
> - queue.buffering.max.ms and .messages
> - batch.num.messages
>
> Memory on the broker should only affect disk cache performance, the more
> the merrier of course, but it depends on your use case, with a bit of luck
> the disk caches are already hot for the data you are reading (e.g.,
> recently produced).
>
> Consuming millions of messages per second on quad core i7 with 8 gigs of
> RAM is possible without a sweat, given the disk caches are hot.
>
>
> Regards,
> Magnus
>
>
> 2013/10/11 Bruno D. Rodrigues <br...@litux.org>
>
>
> On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
> bruno.rodrigues@litux.org> wrote:
>
> My personal newbie experience, which is surely completely wrong and
> miss-configured, got me up to 70MB/sec, either with controlled 1K
>
> messages
>
> (hence 70Kmsg/sec) as well as with more random data (test data from 100
> bytes to a couple MB). First I thought the 70MB were the hard disk
>
> limit,
>
> but when I got the same result both with a proper linux server with a
>
> 10K
>
> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
>
> The mini has 2G, the linux server has 8 or 16, can'r recall at the
>
> moment.
>
>
> The test was performed both with single and multi producers and
>
> consumers.
>
> One producer = 70MB, two producers = 35MB each and so forth. Running
> standalone instances on each server, same value. Running both together
>
> in 2
>
> partition 2 replica crossed mode, same result.
>
> As far as I understood, more memory just means more kernel buffer space
>
> to
>
> speed up the lack of disk speed, as kafka seems to not really depend on
> memory for the queueing.
>
>
> A 11/10/2013, às 17:28, Guozhang Wang <wa...@gmail.com> escreveu:
>
> Hello,
>
> In most cases of Kafka, network bottleneck will be hit before the disk
> bottleneck. So maybe you want to check your network capacity to see if it
> has been saturated.
>
>
> They are all connected to Gbit ethernet cards and proper network routers.
> I can easily get way above 950Mbps up and down between each machine and
> even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far
> so good, 56% network capacity is a goodish value. But then I enable snappy,
> get the same 70MB on the input and output side, and 20MB/sec on the
> network, so it surely isn't network limits. It's also not on the input or
> output side - the input reads a pre-processed MMaped file that reads at
> 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The
> output simply counts the messages and size of them.
>
> One weird thing is that the kafka process seems to not cross the 100% cpu
> on the top or equivalent command. Top shows 100% for each CPU, so a
> multi-threaded process should go up to 400% (both the linux and mac mini
> are 2 CPU with hiperthreading, so "almost" 4 cpus).
>
>
>
>
>

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by "Bruno D. Rodrigues" <br...@litux.org>.

Producer:
        props.put("batch.num.messages", "1000"); // 200
        props.put("queue.buffering.max.messages", "20000"); // 10000   
        props.put("request.required.acks", "0");
        props.put("producer.type", "async"); // sync

        // return ++this.count % a_numPartitions; // just round-robin
        props.put("partitioner.class", "main.SimplePartitioner"); // kafka.producer.DefaultPartitioner

        // disabled = 70MB source, 70MB network, enabled = 70MB source, ~40-50MB network
        props.put("compression.codec", "Snappy"); // none

Consumer is with default settings, as I test separately without any consumer at all, and then test the extra load of having 1..n consumers. I assume the top speed would be without consumers at all. I'm measuring both the produced messages as well as the consumer side.

On the kafka server I've changed the following, expecting less disk writes at the cost of loosing messages:

#log.flush.interval.messages=10000
log.flush.interval.messages=10000000
#log.flush.interval.ms=1000
log.flush.interval.ms=10000
#log.segment.bytes=536870912
# is signed int 32, only up to 2^31-1!
log.segment.bytes=2000000000 
#log.retention.hours=168
log.retention.hours=1


Basically I need high throughput of discardable messages, so having them persisted temporarily on the disk, in an highly optimised manner like Kafka shows, would be great not for the reliability (not loosing messages), but because it would allow me to get some previous messages even if the client (kafka client or real consumer client) disconnects, as well as providing a way to go back in time some seconds if needed.



A 11/10/2013, às 18:56, Magnus Edenhill <ma...@edenhill.se> escreveu:

> Make sure the fetch batch size and the local consumer queue sizes are large
> enough, setting them too low will limit your throughput to the
> broker<->client latency.
> 
> This would be controlled using the following properties:
> - fetch.message.max.bytes
> - queued.max.message.chunks
> 
> On the producer side you would want to play with:
> - queue.buffering.max.ms and .messages
> - batch.num.messages
> 
> Memory on the broker should only affect disk cache performance, the more
> the merrier of course, but it depends on your use case, with a bit of luck
> the disk caches are already hot for the data you are reading (e.g.,
> recently produced).
> 
> Consuming millions of messages per second on quad core i7 with 8 gigs of
> RAM is possible without a sweat, given the disk caches are hot.
> 
> 
> Regards,
> Magnus
> 
> 
> 2013/10/11 Bruno D. Rodrigues <br...@litux.org>
> 
>> 
>>> On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
>>> bruno.rodrigues@litux.org> wrote:
>>> 
>>>> My personal newbie experience, which is surely completely wrong and
>>>> miss-configured, got me up to 70MB/sec, either with controlled 1K
>> messages
>>>> (hence 70Kmsg/sec) as well as with more random data (test data from 100
>>>> bytes to a couple MB). First I thought the 70MB were the hard disk
>> limit,
>>>> but when I got the same result both with a proper linux server with a
>> 10K
>>>> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
>>>> 
>>>> The mini has 2G, the linux server has 8 or 16, can'r recall at the
>> moment.
>>>> 
>>>> The test was performed both with single and multi producers and
>> consumers.
>>>> One producer = 70MB, two producers = 35MB each and so forth. Running
>>>> standalone instances on each server, same value. Running both together
>> in 2
>>>> partition 2 replica crossed mode, same result.
>>>> 
>>>> As far as I understood, more memory just means more kernel buffer space
>> to
>>>> speed up the lack of disk speed, as kafka seems to not really depend on
>>>> memory for the queueing.
>> 
>> A 11/10/2013, às 17:28, Guozhang Wang <wa...@gmail.com> escreveu:
>> 
>>> Hello,
>>> 
>>> In most cases of Kafka, network bottleneck will be hit before the disk
>>> bottleneck. So maybe you want to check your network capacity to see if it
>>> has been saturated.
>> 
>> They are all connected to Gbit ethernet cards and proper network routers.
>> I can easily get way above 950Mbps up and down between each machine and
>> even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far
>> so good, 56% network capacity is a goodish value. But then I enable snappy,
>> get the same 70MB on the input and output side, and 20MB/sec on the
>> network, so it surely isn't network limits. It's also not on the input or
>> output side - the input reads a pre-processed MMaped file that reads at
>> 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The
>> output simply counts the messages and size of them.
>> 
>> One weird thing is that the kafka process seems to not cross the 100% cpu
>> on the top or equivalent command. Top shows 100% for each CPU, so a
>> multi-threaded process should go up to 400% (both the linux and mac mini
>> are 2 CPU with hiperthreading, so "almost" 4 cpus).
>> 
>> 
>>

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by Magnus Edenhill <ma...@edenhill.se>.

Make sure the fetch batch size and the local consumer queue sizes are large
enough, setting them too low will limit your throughput to the
broker<->client latency.

This would be controlled using the following properties:
- fetch.message.max.bytes
- queued.max.message.chunks

On the producer side you would want to play with:
 - queue.buffering.max.ms and .messages
 - batch.num.messages

Memory on the broker should only affect disk cache performance, the more
the merrier of course, but it depends on your use case, with a bit of luck
the disk caches are already hot for the data you are reading (e.g.,
recently produced).

Consuming millions of messages per second on quad core i7 with 8 gigs of
RAM is possible without a sweat, given the disk caches are hot.


Regards,
Magnus


2013/10/11 Bruno D. Rodrigues <br...@litux.org>

>
> > On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
> > bruno.rodrigues@litux.org> wrote:
> >
> >> My personal newbie experience, which is surely completely wrong and
> >> miss-configured, got me up to 70MB/sec, either with controlled 1K
> messages
> >> (hence 70Kmsg/sec) as well as with more random data (test data from 100
> >> bytes to a couple MB). First I thought the 70MB were the hard disk
> limit,
> >> but when I got the same result both with a proper linux server with a
> 10K
> >> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
> >>
> >> The mini has 2G, the linux server has 8 or 16, can'r recall at the
> moment.
> >>
> >> The test was performed both with single and multi producers and
> consumers.
> >> One producer = 70MB, two producers = 35MB each and so forth. Running
> >> standalone instances on each server, same value. Running both together
> in 2
> >> partition 2 replica crossed mode, same result.
> >>
> >> As far as I understood, more memory just means more kernel buffer space
> to
> >> speed up the lack of disk speed, as kafka seems to not really depend on
> >> memory for the queueing.
>
> A 11/10/2013, às 17:28, Guozhang Wang <wa...@gmail.com> escreveu:
>
> > Hello,
> >
> > In most cases of Kafka, network bottleneck will be hit before the disk
> > bottleneck. So maybe you want to check your network capacity to see if it
> > has been saturated.
>
> They are all connected to Gbit ethernet cards and proper network routers.
> I can easily get way above 950Mbps up and down between each machine and
> even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far
> so good, 56% network capacity is a goodish value. But then I enable snappy,
> get the same 70MB on the input and output side, and 20MB/sec on the
> network, so it surely isn't network limits. It's also not on the input or
> output side - the input reads a pre-processed MMaped file that reads at
> 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The
> output simply counts the messages and size of them.
>
> One weird thing is that the kafka process seems to not cross the 100% cpu
> on the top or equivalent command. Top shows 100% for each CPU, so a
> multi-threaded process should go up to 400% (both the linux and mac mini
> are 2 CPU with hiperthreading, so "almost" 4 cpus).
>
>
>

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by "Bruno D. Rodrigues" <br...@litux.org>.

> On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
> bruno.rodrigues@litux.org> wrote:
> 
>> My personal newbie experience, which is surely completely wrong and
>> miss-configured, got me up to 70MB/sec, either with controlled 1K messages
>> (hence 70Kmsg/sec) as well as with more random data (test data from 100
>> bytes to a couple MB). First I thought the 70MB were the hard disk limit,
>> but when I got the same result both with a proper linux server with a 10K
>> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
>> 
>> The mini has 2G, the linux server has 8 or 16, can'r recall at the moment.
>> 
>> The test was performed both with single and multi producers and consumers.
>> One producer = 70MB, two producers = 35MB each and so forth. Running
>> standalone instances on each server, same value. Running both together in 2
>> partition 2 replica crossed mode, same result.
>> 
>> As far as I understood, more memory just means more kernel buffer space to
>> speed up the lack of disk speed, as kafka seems to not really depend on
>> memory for the queueing.

A 11/10/2013, às 17:28, Guozhang Wang <wa...@gmail.com> escreveu:

> Hello,
> 
> In most cases of Kafka, network bottleneck will be hit before the disk
> bottleneck. So maybe you want to check your network capacity to see if it
> has been saturated.

They are all connected to Gbit ethernet cards and proper network routers. I can easily get way above 950Mbps up and down between each machine and even between multiple machines. Gbit is 128MB/s. 70MB/s is 560Kbps. So far so good, 56% network capacity is a goodish value. But then I enable snappy, get the same 70MB on the input and output side, and 20MB/sec on the network, so it surely isn't network limits. It's also not on the input or output side - the input reads a pre-processed MMaped file that reads at 150MB/sec without cache (SSD) up to 3GB/sec when loaded into memory. The output simply counts the messages and size of them.

One weird thing is that the kafka process seems to not cross the 100% cpu on the top or equivalent command. Top shows 100% for each CPU, so a multi-threaded process should go up to 400% (both the linux and mac mini are 2 CPU with hiperthreading, so "almost" 4 cpus).

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by Guozhang Wang <wa...@gmail.com>.

Hello,

In most cases of Kafka, network bottleneck will be hit before the disk
bottleneck. So maybe you want to check your network capacity to see if it
has been saturated.

Guozhang


On Thu, Oct 10, 2013 at 3:57 PM, Bruno D. Rodrigues <
bruno.rodrigues@litux.org> wrote:

> A 10/10/2013, às 23:14, S Ahmed <sa...@gmail.com> escreveu:
>
> > Is anyone out there running a single broker kafka setup?
> >
> > How about with only 8 GB RAM?
> >
> > I'm looking at one of the better dedicated server prodivers, and a 8GB
> > server is pretty much what I want to spend at the moment, would it make
> > sense going this route?
> > This same server would also potentially be running zookeeper also.
> >
> > In terms of messages per second, at most I would be seeing about 2000
> > messages per second, of 20KB to 200KB in size.
> >
> > I know the people at linkedin are running with I believe 24GB of ram.
>
> My personal newbie experience, which is surely completely wrong and
> miss-configured, got me up to 70MB/sec, either with controlled 1K messages
> (hence 70Kmsg/sec) as well as with more random data (test data from 100
> bytes to a couple MB). First I thought the 70MB were the hard disk limit,
> but when I got the same result both with a proper linux server with a 10K
> disk, as well as with a Mac mini with a 5400rpm disk, I got confused.
>
> The mini has 2G, the linux server has 8 or 16, can'r recall at the moment.
>
> The test was performed both with single and multi producers and consumers.
> One producer = 70MB, two producers = 35MB each and so forth. Running
> standalone instances on each server, same value. Running both together in 2
> partition 2 replica crossed mode, same result.
>
> As far as I understood, more memory just means more kernel buffer space to
> speed up the lack of disk speed, as kafka seems to not really depend on
> memory for the queueing.
>
>
>


-- 
-- Guozhang

Re: Anyone running kafka with a single broker in production? what about only 8GB ram?

Posted by "Bruno D. Rodrigues" <br...@litux.org>.

A 10/10/2013, às 23:14, S Ahmed <sa...@gmail.com> escreveu:

> Is anyone out there running a single broker kafka setup?
> 
> How about with only 8 GB RAM?
> 
> I'm looking at one of the better dedicated server prodivers, and a 8GB
> server is pretty much what I want to spend at the moment, would it make
> sense going this route?
> This same server would also potentially be running zookeeper also.
> 
> In terms of messages per second, at most I would be seeing about 2000
> messages per second, of 20KB to 200KB in size.
> 
> I know the people at linkedin are running with I believe 24GB of ram.

My personal newbie experience, which is surely completely wrong and miss-configured, got me up to 70MB/sec, either with controlled 1K messages (hence 70Kmsg/sec) as well as with more random data (test data from 100 bytes to a couple MB). First I thought the 70MB were the hard disk limit, but when I got the same result both with a proper linux server with a 10K disk, as well as with a Mac mini with a 5400rpm disk, I got confused.

The mini has 2G, the linux server has 8 or 16, can'r recall at the moment. 

The test was performed both with single and multi producers and consumers. One producer = 70MB, two producers = 35MB each and so forth. Running standalone instances on each server, same value. Running both together in 2 partition 2 replica crossed mode, same result.

As far as I understood, more memory just means more kernel buffer space to speed up the lack of disk speed, as kafka seems to not really depend on memory for the queueing.