You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by girija arumugam <gi...@gmail.com> on 2020/12/01 10:53:37 UTC

Re: Regarding framing producer rate in-terms of software as well as hardware configurations

Adding few application related configurations which can affect producer
rate,

   - linger.ms
   - batch.size
   - buffer.memory
   - acks
   - compression
   - num.io.threads
   - num.network.threads


On Mon, Nov 30, 2020 at 3:07 PM girija arumugam <gi...@gmail.com>
wrote:

> Team,
> *Use-case :*
>     *IMAP* . I have an application in which an org has users , who use
> IMAP to send mails, where the mail contents are produced to kafka.
>
> Here the scaling factors are
>
>    1. org can grow from 1 to million
>    2. users can grow from 1 to million.
>
> For this use-case, I need to calculate the producer rate and broker
> response rate for a single machine.
>
> So far we have identified, the factors that will be involved in
> producer-rate are :
>
>    1. Message size
>    2. Request size
>    3. Request rate overhead
>    4. Request latency
>    5. Round Trip Time
>    6. Number of Sender Threads
>    7. Number of Processor Threads at Broker
>    8. Replication factor
>
> Variables identified at Network layer, Kernel, NIC :
>
>    1. sysctl_wmem
>    2. Tx queues
>    3. Ring Buffer
>    4. Driver Queue
>    5. NAPI Polling
>
> Observations made so far :
>
>    1. SocketChannel is the one who is the entry point of sending data at
>    the application level.
>    2. sendfile() system call used to transfer the data.
>
> *Questions* :
>
>    1. How data is transferred from SocketChannel to NIC ? (ie) The
>    data-flow in-terms of network(protocol) layer, kernel, network device
>    drivers, NIC .
>    2. Since, each KafkaProducer instance will create an
>    SocketChannel.What is the maximum number of producer instances , a machine
>    can have to utilise the network in an efficient manner ?
>    3. In-addition to the above listed variables,
>       1. What are the list of variables involved in sending data in the
>       network layer ?
>       2. What are the list of variables involved in sending data in the
>       kernel ?
>       3. What are the list of variables involved in sending data to NIC ?
>    4. How to frame the producer rate in-terms of the variables identified
>    in each layer ?
>    5. *With the given machine hardware, how to precisely frame the
>    producer rate in a single formula in-terms of hardware and software level ?*
>
>
> Anyone, Please help me in identifying the variables and also in-corporate
> those variables in a single formula to frame the producer-rate for a
> machine in-terms of producer instances.
>
> Thanks in advance.
>
> PS : I have already came across the following documents
>
>    -
>    https://www.confluent.io/blog/how-choose-number-topics-partitions-kafka-cluster/
>    - https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing
>    -
>    https://www.slideshare.net/JiangjieQin/producer-performance-tuning-for-apache-kafka-63147600
>
>
> Regards,
> Girija A.
>
>
>