You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by "Xu, Nan" <nx...@baml.com.INVALID> on 2019/03/14 20:43:03 UTC

kafka latency for large message

Hi, 
   
    We are using kafka to send messages and there is less than 1% of message is very big, close to 30M. understanding kafka is not ideal for sending big messages, because the large message rate is very low, we just want let kafka do it anyway. But still want to get a reasonable latency.

    To test, I just setup up a topic test on a single broker local kafka,  with only 1 partition and 1 replica, using the following command

./kafka-producer-perf-test.sh  --topic test --num-records 2000000  --throughput 1 --record-size 30000000 --producer.config ../config/producer.properties

Producer.config

#Max 40M message
max.request.size=40000000
buffer.memory=40000000

#2M buffer
send.buffer.bytes=2000000

6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency, 1386.0 max latency.
6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency, 1313.0 max latency.
5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency, 643.0 max latency.
6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency, 1171.0 max latency.
5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency, 729.0 max latency.
5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency, 673.0 max latency.
6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency, 1255.0 max latency.
5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency, 685.0 max latency.
5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency, 685.0 max latency.


On the broker, I change the 

socket.send.buffer.bytes=2024000
# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=2224000

and all others are default.

I am a little surprised to see about 1 s max latency and average about 0.5 s. my understanding is kafka is doing the memory mapping for log file and let system flush it. all the write are sequential. So flush should be not affected by message size that much. Batching and network will take longer, but those are memory based and local machine. my ssd should be far better than 0.5 second. where the time got consumed? any suggestion?

Thanks,
Nan







----------------------------------------------------------------------
This message, and any attachments, is for the intended recipient(s) only, may contain information that is privileged, confidential and/or proprietary and subject to important terms and conditions available at http://www.bankofamerica.com/emaildisclaimer.   If you are not the intended recipient, please delete this message.

Re: kafka latency for large message

Posted by Mike Trienis <mt...@quickinsights.io>.
It takes time to send that much data over the network. Why would you expect
a smaller latency?

On Mon, Mar 18, 2019 at 8:05 AM Nan Xu <na...@gmail.com> wrote:

> anyone can give some suggestion? or an explanation why kafka give a big
> latency for large payload.
>
> Thanks,
> Nan
>
> On Thu, Mar 14, 2019 at 3:53 PM Xu, Nan <nx...@baml.com.invalid> wrote:
>
> > Hi,
> >
> >     We are using kafka to send messages and there is less than 1% of
> > message is very big, close to 30M. understanding kafka is not ideal for
> > sending big messages, because the large message rate is very low, we just
> > want let kafka do it anyway. But still want to get a reasonable latency.
> >
> >     To test, I just setup up a topic test on a single broker local kafka,
> > with only 1 partition and 1 replica, using the following command
> >
> > ./kafka-producer-perf-test.sh  --topic test --num-records 2000000
> > --throughput 1 --record-size 30000000 --producer.config
> > ../config/producer.properties
> >
> > Producer.config
> >
> > #Max 40M message
> > max.request.size=40000000
> > buffer.memory=40000000
> >
> > #2M buffer
> > send.buffer.bytes=2000000
> >
> > 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> > 1386.0 max latency.
> > 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> > 1313.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> > 643.0 max latency.
> > 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> > 1171.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> > 729.0 max latency.
> > 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> > 673.0 max latency.
> > 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> > 1255.0 max latency.
> > 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> > 685.0 max latency.
> > 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> > 685.0 max latency.
> >
> >
> > On the broker, I change the
> >
> > socket.send.buffer.bytes=2024000
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=2224000
> >
> > and all others are default.
> >
> > I am a little surprised to see about 1 s max latency and average about
> 0.5
> > s. my understanding is kafka is doing the memory mapping for log file and
> > let system flush it. all the write are sequential. So flush should be not
> > affected by message size that much. Batching and network will take
> longer,
> > but those are memory based and local machine. my ssd should be far better
> > than 0.5 second. where the time got consumed? any suggestion?
> >
> > Thanks,
> > Nan
> >
> >
> >
> >
> >
> >
> >
> > ----------------------------------------------------------------------
> > This message, and any attachments, is for the intended recipient(s) only,
> > may contain information that is privileged, confidential and/or
> proprietary
> > and subject to important terms and conditions available at
> > http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> > intended recipient, please delete this message.
> >
>


-- 
Thanks, Mike

Re: kafka latency for large message

Posted by Nan Xu <na...@gmail.com>.
anyone can give some suggestion? or an explanation why kafka give a big
latency for large payload.

Thanks,
Nan

On Thu, Mar 14, 2019 at 3:53 PM Xu, Nan <nx...@baml.com.invalid> wrote:

> Hi,
>
>     We are using kafka to send messages and there is less than 1% of
> message is very big, close to 30M. understanding kafka is not ideal for
> sending big messages, because the large message rate is very low, we just
> want let kafka do it anyway. But still want to get a reasonable latency.
>
>     To test, I just setup up a topic test on a single broker local kafka,
> with only 1 partition and 1 replica, using the following command
>
> ./kafka-producer-perf-test.sh  --topic test --num-records 2000000
> --throughput 1 --record-size 30000000 --producer.config
> ../config/producer.properties
>
> Producer.config
>
> #Max 40M message
> max.request.size=40000000
> buffer.memory=40000000
>
> #2M buffer
> send.buffer.bytes=2000000
>
> 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> 1386.0 max latency.
> 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> 1313.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> 643.0 max latency.
> 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> 1171.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> 729.0 max latency.
> 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> 673.0 max latency.
> 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> 1255.0 max latency.
> 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> 685.0 max latency.
> 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> 685.0 max latency.
>
>
> On the broker, I change the
>
> socket.send.buffer.bytes=2024000
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=2224000
>
> and all others are default.
>
> I am a little surprised to see about 1 s max latency and average about 0.5
> s. my understanding is kafka is doing the memory mapping for log file and
> let system flush it. all the write are sequential. So flush should be not
> affected by message size that much. Batching and network will take longer,
> but those are memory based and local machine. my ssd should be far better
> than 0.5 second. where the time got consumed? any suggestion?
>
> Thanks,
> Nan
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> intended recipient, please delete this message.
>

Re: kafka latency for large message

Posted by Nan Xu <na...@gmail.com>.
that's very good information from the slides, thanks. Our design to use
kafka has 2 purpose. one is use it as a cache, we use ktable for that
purpose, second purpose is use as message delivery mechanism to send it to
other system. Because we very much care the latency, the ktable with a
compact topic suit us very well, if has to find another system to do the
caching, big change involved. The way described in the slides, which break
the message to smaller chunks then reassemble them seems a viable solution.

do you know why kafka doesn't have a liner latency for big messages
comparing to small ones. for 2M message, I have avg latency less than 10
ms, more expecting for 30M has latency less than 10 * 20 = 200ms

On Mon, Mar 18, 2019 at 3:29 PM Bruce Markey <bj...@confluent.io> wrote:

> Hi Nan,
>
> Would you consider other approaches that may actually be a more efficient
> solution for you? There is a slide deck Handle Large Messages In Apache
> Kafka
> <
> https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297
> >.
> For messages this large, one of the approaches suggested is Reference Based
> Messaging where you write your large files to an external data store then
> produce a small Apache Kafka message with a reference for where to find the
> file. This would allow your consumer applications to find the file as
> needed rather than storing all that data in the event log.
>
> --  bjm
>
> On Thu, Mar 14, 2019 at 1:53 PM Xu, Nan <nx...@baml.com.invalid> wrote:
>
> > Hi,
> >
> >     We are using kafka to send messages and there is less than 1% of
> > message is very big, close to 30M. understanding kafka is not ideal for
> > sending big messages, because the large message rate is very low, we just
> > want let kafka do it anyway. But still want to get a reasonable latency.
> >
> >     To test, I just setup up a topic test on a single broker local kafka,
> > with only 1 partition and 1 replica, using the following command
> >
> > ./kafka-producer-perf-test.sh  --topic test --num-records 2000000
> > --throughput 1 --record-size 30000000 --producer.config
> > ../config/producer.properties
> >
> > Producer.config
> >
> > #Max 40M message
> > max.request.size=40000000
> > buffer.memory=40000000
> >
> > #2M buffer
> > send.buffer.bytes=2000000
> >
> > 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> > 1386.0 max latency.
> > 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> > 1313.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> > 643.0 max latency.
> > 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> > 1171.0 max latency.
> > 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> > 729.0 max latency.
> > 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> > 673.0 max latency.
> > 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> > 1255.0 max latency.
> > 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> > 685.0 max latency.
> > 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> > 685.0 max latency.
> >
> >
> > On the broker, I change the
> >
> > socket.send.buffer.bytes=2024000
> > # The receive buffer (SO_RCVBUF) used by the socket server
> > socket.receive.buffer.bytes=2224000
> >
> > and all others are default.
> >
> > I am a little surprised to see about 1 s max latency and average about
> 0.5
> > s. my understanding is kafka is doing the memory mapping for log file and
> > let system flush it. all the write are sequential. So flush should be not
> > affected by message size that much. Batching and network will take
> longer,
> > but those are memory based and local machine. my ssd should be far better
> > than 0.5 second. where the time got consumed? any suggestion?
> >
> > Thanks,
> > Nan
> >
> >
> >
> >
> >
> >
> >
> > ----------------------------------------------------------------------
> > This message, and any attachments, is for the intended recipient(s) only,
> > may contain information that is privileged, confidential and/or
> proprietary
> > and subject to important terms and conditions available at
> > http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> > intended recipient, please delete this message.
> >
>

Re: kafka latency for large message

Posted by Bruce Markey <bj...@confluent.io>.
Hi Nan,

Would you consider other approaches that may actually be a more efficient
solution for you? There is a slide deck Handle Large Messages In Apache
Kafka
<https://www.slideshare.net/JiangjieQin/handle-large-messages-in-apache-kafka-58692297>.
For messages this large, one of the approaches suggested is Reference Based
Messaging where you write your large files to an external data store then
produce a small Apache Kafka message with a reference for where to find the
file. This would allow your consumer applications to find the file as
needed rather than storing all that data in the event log.

--  bjm

On Thu, Mar 14, 2019 at 1:53 PM Xu, Nan <nx...@baml.com.invalid> wrote:

> Hi,
>
>     We are using kafka to send messages and there is less than 1% of
> message is very big, close to 30M. understanding kafka is not ideal for
> sending big messages, because the large message rate is very low, we just
> want let kafka do it anyway. But still want to get a reasonable latency.
>
>     To test, I just setup up a topic test on a single broker local kafka,
> with only 1 partition and 1 replica, using the following command
>
> ./kafka-producer-perf-test.sh  --topic test --num-records 2000000
> --throughput 1 --record-size 30000000 --producer.config
> ../config/producer.properties
>
> Producer.config
>
> #Max 40M message
> max.request.size=40000000
> buffer.memory=40000000
>
> #2M buffer
> send.buffer.bytes=2000000
>
> 6 records sent, 1.1 records/sec (31.00 MB/sec), 973.0 ms avg latency,
> 1386.0 max latency.
> 6 records sent, 1.0 records/sec (28.91 MB/sec), 787.2 ms avg latency,
> 1313.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 582.8 ms avg latency,
> 643.0 max latency.
> 6 records sent, 1.1 records/sec (30.16 MB/sec), 685.3 ms avg latency,
> 1171.0 max latency.
> 5 records sent, 1.0 records/sec (27.92 MB/sec), 629.4 ms avg latency,
> 729.0 max latency.
> 5 records sent, 1.0 records/sec (27.61 MB/sec), 635.6 ms avg latency,
> 673.0 max latency.
> 6 records sent, 1.1 records/sec (30.09 MB/sec), 736.2 ms avg latency,
> 1255.0 max latency.
> 5 records sent, 1.0 records/sec (27.62 MB/sec), 626.8 ms avg latency,
> 685.0 max latency.
> 5 records sent, 1.0 records/sec (28.38 MB/sec), 608.8 ms avg latency,
> 685.0 max latency.
>
>
> On the broker, I change the
>
> socket.send.buffer.bytes=2024000
> # The receive buffer (SO_RCVBUF) used by the socket server
> socket.receive.buffer.bytes=2224000
>
> and all others are default.
>
> I am a little surprised to see about 1 s max latency and average about 0.5
> s. my understanding is kafka is doing the memory mapping for log file and
> let system flush it. all the write are sequential. So flush should be not
> affected by message size that much. Batching and network will take longer,
> but those are memory based and local machine. my ssd should be far better
> than 0.5 second. where the time got consumed? any suggestion?
>
> Thanks,
> Nan
>
>
>
>
>
>
>
> ----------------------------------------------------------------------
> This message, and any attachments, is for the intended recipient(s) only,
> may contain information that is privileged, confidential and/or proprietary
> and subject to important terms and conditions available at
> http://www.bankofamerica.com/emaildisclaimer.   If you are not the
> intended recipient, please delete this message.
>