You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Paolo Patierno <pp...@live.com> on 2016/04/22 18:35:39 UTC

Huge number of threads for supporting huge Kafka consumers

Hi all,

I'm developing an AMQP - Kafka bridge and I'm facing with a great limitation of current 0.9.1 Kafka client APIs.

The consumer.poll() method is synchronous and as we know it's needed in order to send the heartbeat even if no records are available. It means that poll() needs to be called frequently before session ending.
For that reason a thread per consumer is suggested because if we use a thread pool (let me say with 20 threads) but 1000 consumer ... it's possible that one consumer will be served too late without sending the heartbeat.

I'm saying that because the bridging between AMQP and Kafka works at AMQP link level so for each attached link to a specific topic there is the corresponding Kafka consumer and the client could be thousands.

Any thoughts about that ?

Paolo.



Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor 
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience 		 	   		  

Re: Huge number of threads for supporting huge Kafka consumers

Posted by Cees de Groot <ce...@pagerduty.com>.
Hi Paolo,

Dig a bit through the mailing list archives - IIRC there's a "trick"
that lets you do long processing. Basically you pull in a big batch,
unsubscribe from all topics, do regular polls (that will just send the
heartbeat because you don't have active subscriptions) and then when
done, re-subscribe, poll. With CGs, you should be able to have lots of
consumers doing this in a group and get what you want without doing
manual thread pooling, etc.

I do hope that manual heartbeating will be available at some time - it
seems to be a recurring problem. Of course, the consumer library
(where all the magic happens) is Open Source so you could dig in and
hack your way around it :-)

On Fri, Apr 22, 2016 at 12:35 PM, Paolo Patierno <pp...@live.com> wrote:
> Hi all,
>
> I'm developing an AMQP - Kafka bridge and I'm facing with a great limitation of current 0.9.1 Kafka client APIs.
>
> The consumer.poll() method is synchronous and as we know it's needed in order to send the heartbeat even if no records are available. It means that poll() needs to be called frequently before session ending.
> For that reason a thread per consumer is suggested because if we use a thread pool (let me say with 20 threads) but 1000 consumer ... it's possible that one consumer will be served too late without sending the heartbeat.
>
> I'm saying that because the bridging between AMQP and Kafka works at AMQP link level so for each attached link to a specific topic there is the corresponding Kafka consumer and the client could be thousands.
>
> Any thoughts about that ?
>
> Paolo.
>
>
>
> Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
> Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor
> Twitter : @ppatierno
> Linkedin : paolopatierno
> Blog : DevExperience



-- 
Cees de Groot
Principal Software Engineer
PagerDuty, Inc.