You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Bhavesh Mistry <mi...@gmail.com> on 2015/02/04 17:50:45 UTC

[Discussion] Producer Instance and Decouple Kafka Cluster State/TCP Connection Management

Hi Kafka Dev team,



I would like to discuss the Kafka Cluster State Management and Producer
Instance relationship in 0.8.2.



Current Implementation of Producer ties very closely with Kafka Cluster,
and Topic Metadata management.  So imagine, you have following scenarios:



Application crates multiple instance of Producers for same Kafka cluster
for same topic or more topic (for the same cluster)  to get better
throughput or to avoid blocking calls or avoid synchronization issue ( by
key hash to same partition etc).



It would be great if design and implementation of Cluster State/TCP
connection management and Producer Instance are decoupled with Producer
Instance.



Suppose, we are doing aggregation using Kafka and we wanted to avoid
Dequeue Sync call for performance reason ( I would have to create #
producer = # of partition which can be many each one will manage cluster
and TCP state separately ) .



All I am asking, is it possible to have low level TCP connection pooling/
Cluster State Management and Topic Meta Data Management separately then
encapsulating everything within Producer Instance.



In my case , I would create io thread per partition and dump data one to
one mapping to avoid blocking/sync calls.    Having public API for
Cluster/TCP connection   management and Kafka Java Protocol will allow
advance way to achieve higher throughput at expense of CPU/memory which
application control.



I am referring to my experience with
https://issues.apache.org/jira/browse/KAFKA-1710



Let me know you’re taught process.


Thanks,



Bhavesh