You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Alexander Binzberger <al...@wingcon.com> on 2017/01/31 10:47:24 UTC

[DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Hi there,
I realized that consuming messages is implemented as polling on protocol 
level.
Is there a reason why the client has to ask for messages?
Wouldn't it make more sense to push messages to consumers - at least to 
high-level ones?

I would think about it like:
Broker receives a new message from a produce.
It could push the message first to all replicas and then (bulked) to all 
consumers (at least high-level ones with consumer group). [bulk: 
offsetOfFirstMsg, list(msgs)]
Consumers could ack the (last) message (with the new offset) just after 
processing it. [newOffset]
After the last message is acked the broker would push the next (bulked) 
messages to consume. And so on..

Pros:
This way it seams like the protocol and the high-level consumer would be 
simplified.
Clients have a more natural control over the offset and could ack per 
message or per bulk as needed or performance allows.
Additionally the stream processing path over Kafka would be faster.

Best Regards
Alexander Binzberger

-- 
Alexander Binzberger
System Designer - WINGcon AG
Tel. +49 7543 966-119

Sitz der Gesellschaft: Langenargen
Registergericht: ULM, HRB 734260
USt-Id.: DE232931635, WEEE-Id.: DE74015979
Vorstand: thomasThomas Ehrle (Vorsitz), Fritz R. Paul (Stellvertreter), Tobias Tre�
Aufsichtsrat: J�rgen Maucher (Vorsitz), Andreas Paul (Stellvertreter), Martin Sauter


Re: [DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Posted by Ismael Juma <is...@juma.me.uk>.
On Thu, Feb 2, 2017 at 4:41 PM, radai <ra...@gmail.com> wrote:

> also - i dont think you need to shorten fetch.max.wait.ms to get lower
> delays - you could still configure a relatively long fetch.max.wait.ms and
> have the broker answer your poll the minute _any_ messags are available.
>

Yes, `fetch.min.bytes` is 1 by default, so the broker will return data as
soon as it's available if that is not changed.

Ismael

Re: [DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Posted by radai <ra...@gmail.com>.
kafka relies on the underlying OS' page cache for serving "popular" data.
so "pre-assembling" push batches would move from page cache to heap
storage, which is not as appealing.
also, for trivial cases a lot of consumers read the same thing, which would
make the heap caching even worse.

also - i dont think you need to shorten fetch.max.wait.ms to get lower
delays - you could still configure a relatively long fetch.max.wait.ms and
have the broker answer your poll the minute _any_ messags are available.

On Wed, Feb 1, 2017 at 2:46 AM, Alexander Binzberger <
alexander.binzberger@wingcon.com> wrote:

> ave very few. I don't see how push would cost more CPU time or resources
> on the broker then polling with a lot of consumers very frequently.

Re: [DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Posted by Alexander Binzberger <al...@wingcon.com>.
Yes I have seen fetch.max.wait.ms - you would not need this parameter 
with push. The broker would have time to collect a batch while the last 
push gets processed or lets say until the broker gets a ack with the new 
offset for the last pushed message. But of course you could use this 
param (broker side instead of client side) quite the same way in addition.

Yes it would shift some logic to the broker but it should not increase 
load. It should have very little CPU impact in worst case and the 
messages could stay in memory till they are pushed.
Actually I think it would even decrease the load on the broker.
After the latest consumer changes it would make sense to me changing 
this next.
At the moment I have about 100-200 Consumer-Clients connected to one 
Broker. All of them cause FetchRequests all the time even if there is no 
data. To keep the message delivery delay short I have configured 
everything to reduce it (e.g. fetch.max.wait.ms).
Some of the consumers have a high load of messages but most have very 
few. I don't see how push would cost more CPU time or resources on the 
broker then polling with a lot of consumers very frequently.
Actually I have quite the same situation for metadata changes. I have 
seen a lot of clients querying metadata quite frequent. Metadata is 
unlikely to change and 100 requests (maybe) per second have a quite huge 
performance impact. Pushing matadata on changes would also make more 
sense here I think.

Sorry for the confusing subject line - should have been "push" there.

Alexander

Am 31.01.2017 um 23:42 schrieb Jason Gustafson:
> Also, have you looked at the use of the max wait time in fetch requests (
> fetch.max.wait.ms for the new consumer)? The broker will hold the fetch in
> purgatory until data is available. Sort of lets you fake a push model.
>
> -Jason
>
> On Tue, Jan 31, 2017 at 2:29 PM, radai <ra...@gmail.com> wrote:
>
>> minimizing the cost of clients is part of what makes kafka scale.
>> a push model would shift a lot of tracking logic onto the broker.
>>
>> On Tue, Jan 31, 2017 at 2:47 AM, Alexander Binzberger <
>> alexander.binzberger@wingcon.com> wrote:
>>
>>> way it seams like the protocol and the high-level consumer would be
>>> simplified.
>>> Clients have a more natural control over the offset and could ack per
>>> message or per bulk as needed or performance allows.
>>> Additionally the stream processing path over
>>>

-- 
Alexander Binzberger
System Designer - WINGcon AG
Tel. +49 7543 966-119

Sitz der Gesellschaft: Langenargen
Registergericht: ULM, HRB 734260
USt-Id.: DE232931635, WEEE-Id.: DE74015979
Vorstand: thomasThomas Ehrle (Vorsitz), Fritz R. Paul (Stellvertreter), Tobias Tre�
Aufsichtsrat: J�rgen Maucher (Vorsitz), Andreas Paul (Stellvertreter), Martin Sauter


Re: [DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Posted by Jason Gustafson <ja...@confluent.io>.
Also, have you looked at the use of the max wait time in fetch requests (
fetch.max.wait.ms for the new consumer)? The broker will hold the fetch in
purgatory until data is available. Sort of lets you fake a push model.

-Jason

On Tue, Jan 31, 2017 at 2:29 PM, radai <ra...@gmail.com> wrote:

> minimizing the cost of clients is part of what makes kafka scale.
> a push model would shift a lot of tracking logic onto the broker.
>
> On Tue, Jan 31, 2017 at 2:47 AM, Alexander Binzberger <
> alexander.binzberger@wingcon.com> wrote:
>
> > way it seams like the protocol and the high-level consumer would be
> > simplified.
> > Clients have a more natural control over the offset and could ack per
> > message or per bulk as needed or performance allows.
> > Additionally the stream processing path over
> >
>

Re: [DISCUS] consuming messages is polling - is there a reason? new KIP for poll?

Posted by radai <ra...@gmail.com>.
minimizing the cost of clients is part of what makes kafka scale.
a push model would shift a lot of tracking logic onto the broker.

On Tue, Jan 31, 2017 at 2:47 AM, Alexander Binzberger <
alexander.binzberger@wingcon.com> wrote:

> way it seams like the protocol and the high-level consumer would be
> simplified.
> Clients have a more natural control over the offset and could ack per
> message or per bulk as needed or performance allows.
> Additionally the stream processing path over
>