You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Jason Rosenberg <jb...@squareup.com> on 2014/11/21 01:24:41 UTC

new producer api and batched Futures....

I've been looking at the new producer api with anticipation, but have not
fired it up yet.

One question I have, is it looks like there's no longer a 'batch' send mode
(and I get that this is all now handled internally, e.g. you send
individual messages, that then get collated and batched up and sent out).

What I'm wondering, is whether there's added overhead in the producer (and
the client code) having to manage all the Future return Objects from all
the individual messages sent?  If I'm sending 100K messages/second, etc.,
that seems like a lot of async Future Objects that have to be tickled, and
waited for, etc.  Does not this cause some overhead?

If I send a bunch of messages and then store all the Future's in a list,
and then wait for all of them, it seems like a lot of thread contention.
On the other hand, if I send a batch of messages, that are likely all to
get sent as a single batch over the wire (cuz they are all going to the
same partition), wouldn't there be some benefit in only having to wait for
a single Future Object for the batch?

Jason

Re: new producer api and batched Futures....

Posted by Jason Rosenberg <jb...@squareup.com>.

I guess it would make the api less clean, but I can imagine a sendBatch
method, which returns a single Future that gets triggered only when all
messages in the batch were finished.  The callback info could then contain
info about the success/exceptions encountered by each sub-group of
messages.  And the callback could even be called multiple times, once for
each sub-batch sent.   It gets complicated to think about it, but it would
be fewer Future objects created and less async contention/waiting, etc.

I'll try it out and see....

Jason

On Thu, Nov 20, 2014 at 7:56 PM, Jay Kreps <ja...@gmail.com> wrote:

> Internally it works as you describe, there is only one CountDownLatch per
> batch sent, each of the futures is just a wrapper around that.
>
> It is true that if you accumulate thousands of futures in a list that may
> be a fair number of objects you are retaining, and there will be some work
> involved in checking them all. If you are sure they are all going to the
> same partition you can actually wait on the last future since sends are
> ordered within a partition. So when the final send completes the prior
> sends should also have completed.
>
> Either way if you see a case where the new producer isn't as fast as the
> old producer let us know.
>
> -Jay
>
>
>
> On Thu, Nov 20, 2014 at 4:24 PM, Jason Rosenberg <jb...@squareup.com> wrote:
>
> > I've been looking at the new producer api with anticipation, but have not
> > fired it up yet.
> >
> > One question I have, is it looks like there's no longer a 'batch' send
> mode
> > (and I get that this is all now handled internally, e.g. you send
> > individual messages, that then get collated and batched up and sent out).
> >
> > What I'm wondering, is whether there's added overhead in the producer
> (and
> > the client code) having to manage all the Future return Objects from all
> > the individual messages sent?  If I'm sending 100K messages/second, etc.,
> > that seems like a lot of async Future Objects that have to be tickled,
> and
> > waited for, etc.  Does not this cause some overhead?
> >
> > If I send a bunch of messages and then store all the Future's in a list,
> > and then wait for all of them, it seems like a lot of thread contention.
> > On the other hand, if I send a batch of messages, that are likely all to
> > get sent as a single batch over the wire (cuz they are all going to the
> > same partition), wouldn't there be some benefit in only having to wait
> for
> > a single Future Object for the batch?
> >
> > Jason
> >
>

Re: new producer api and batched Futures....

Posted by Jay Kreps <ja...@gmail.com>.

Internally it works as you describe, there is only one CountDownLatch per
batch sent, each of the futures is just a wrapper around that.

It is true that if you accumulate thousands of futures in a list that may
be a fair number of objects you are retaining, and there will be some work
involved in checking them all. If you are sure they are all going to the
same partition you can actually wait on the last future since sends are
ordered within a partition. So when the final send completes the prior
sends should also have completed.

Either way if you see a case where the new producer isn't as fast as the
old producer let us know.

-Jay

On Thu, Nov 20, 2014 at 4:24 PM, Jason Rosenberg <jb...@squareup.com> wrote:

> I've been looking at the new producer api with anticipation, but have not
> fired it up yet.
>
> One question I have, is it looks like there's no longer a 'batch' send mode
> (and I get that this is all now handled internally, e.g. you send
> individual messages, that then get collated and batched up and sent out).
>
> What I'm wondering, is whether there's added overhead in the producer (and
> the client code) having to manage all the Future return Objects from all
> the individual messages sent?  If I'm sending 100K messages/second, etc.,
> that seems like a lot of async Future Objects that have to be tickled, and
> waited for, etc.  Does not this cause some overhead?
>
> If I send a bunch of messages and then store all the Future's in a list,
> and then wait for all of them, it seems like a lot of thread contention.
> On the other hand, if I send a batch of messages, that are likely all to
> get sent as a single batch over the wire (cuz they are all going to the
> same partition), wouldn't there be some benefit in only having to wait for
> a single Future Object for the batch?
>
> Jason
>