You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Chris Hogue <cs...@gmail.com> on 2013/08/02 17:26:36 UTC

EventHandler in 0.8

We're a heavy 0.7 user and are now digging into 0.8 for some new projects.
One of the features we used in 0.7 appears to be different and not clearly
supported in 0.8.

We use the EventHandler plug-point in 0.7, specifically to do custom
batching before the messages are actually sent to the broker. We were
hoping to use this feature in 0.8, albeit for a slightly different reason.
In testing 0.8 we found the broker had a pretty significant choke point
with compression. I'm sure this is a known performance consideration, where
using the built-in compression forces the broker to un-compress and
re-compress the messages to assign offsets.

We changed our test to batch and compress the messages before it went to
the producer, leaving the compression.codec set to none. With this change
we saw on the order of a 3x increase in throughput on the broker while
still retaining compressed messages across the wire and on disk.

This was good for the tests but in the app we are planning to use semantic
partitioning, so pre-batching and compressing before going to the producer
is trickier. The hope was to use the producer's batching and partition
selection, just plugging into the event handler for the compression piece.
While I see the DefaultEventHandler is there and potentially replaceable
via the constructor, it appears to have taken on more functionality and
would thus be more involved to replace.

Combined with all produce requests across brokers now being on a single
thread, this is appearing to be difficult to achieve.

I can imagine why some of these changes are in place in light of the
replication feature, but wanted to throw it out there (a) in case we missed
something or (b) at least as a potential consideration in the client
re-design project that is under way.

I'd be happy to have any input or other ideas on how to accomplish this
that we may have missed. We'll look at other ways to tackle this and follow
the other threads about the new clients in the meantime.

Thanks.

-Chris

Re: EventHandler in 0.8

Posted by Chris Hogue <cs...@gmail.com>.
Hi Jun.

Yeah, I assumed this was a little-used hook so wasn't altogether surprised
it's not supported in 0.8. For our current problem we'll have to dig a
little further on how to approach it. Batching and compressing before
sending to the producer is straight-forward enough, but the semantic
partitioning is important in this case. So we'll have to batch/collate in a
way that accounts for this. Off the top of my head I don't remember being
able to provide the standard producer with a pre-selected partition. Maybe
a custom partitioner will allow us to do that--I'll dig more for that.

Also, just to put something concrete behind the 2 use cases where we've
wanted this for potential future consideration, they are:

1. The custom compression described in this thread. It's a significant
throughput increase that we'll be hesitant to give up. It could probably be
solved in other ways internally, but the event handler would allow us to do
it even without the producer having it built-in.

2. Our 0.7 application uses this to pre-batch for the consuming
application. The producer obviously benefits from batching itself, and the
async producer provides this. The consumer application in this case does
little more than read from kafka, a bit of book-keeping, then copy the
bytes to the wire. The consumer app wants to write to the wire in batches
as well, so having them created on the producer allows us to do it once
there, then more or less just shuttle the bytes from the kafka message to
the outbound network.

Hope that provides a bit more info on how we are using it. We'll look more
into the partitioning/batching options we have for the 0.8 producer.

Thanks for the background.

-Chris








On Fri, Aug 2, 2013 at 9:56 AM, Jun Rao <ju...@gmail.com> wrote:

> Chris,
>
> Thanks for bringing up this part. Yes, in 0.8, we don't really support a
> custom event handler. This is because (1) the producer send logic is a bit
> more complicated since it has to issue metadata requests whenever leaders
> change; (2) we are not aware of too many use cases of custom event handler.
> So, we only made the serializer and partitioner customizable. We can
> revisit this if there are good use cases of custom event handler. For your
> use case, could you just do the batching/customized compression outside of
> producer?
>
> Thanks,
>
> Jun
>
>
>
>
> On Fri, Aug 2, 2013 at 8:26 AM, Chris Hogue <cs...@gmail.com> wrote:
>
> > We're a heavy 0.7 user and are now digging into 0.8 for some new
> projects.
> > One of the features we used in 0.7 appears to be different and not
> clearly
> > supported in 0.8.
> >
> > We use the EventHandler plug-point in 0.7, specifically to do custom
> > batching before the messages are actually sent to the broker. We were
> > hoping to use this feature in 0.8, albeit for a slightly different
> reason.
> > In testing 0.8 we found the broker had a pretty significant choke point
> > with compression. I'm sure this is a known performance consideration,
> where
> > using the built-in compression forces the broker to un-compress and
> > re-compress the messages to assign offsets.
> >
> > We changed our test to batch and compress the messages before it went to
> > the producer, leaving the compression.codec set to none. With this change
> > we saw on the order of a 3x increase in throughput on the broker while
> > still retaining compressed messages across the wire and on disk.
> >
> > This was good for the tests but in the app we are planning to use
> semantic
> > partitioning, so pre-batching and compressing before going to the
> producer
> > is trickier. The hope was to use the producer's batching and partition
> > selection, just plugging into the event handler for the compression
> piece.
> > While I see the DefaultEventHandler is there and potentially replaceable
> > via the constructor, it appears to have taken on more functionality and
> > would thus be more involved to replace.
> >
> > Combined with all produce requests across brokers now being on a single
> > thread, this is appearing to be difficult to achieve.
> >
> > I can imagine why some of these changes are in place in light of the
> > replication feature, but wanted to throw it out there (a) in case we
> missed
> > something or (b) at least as a potential consideration in the client
> > re-design project that is under way.
> >
> > I'd be happy to have any input or other ideas on how to accomplish this
> > that we may have missed. We'll look at other ways to tackle this and
> follow
> > the other threads about the new clients in the meantime.
> >
> > Thanks.
> >
> > -Chris
> >
>

Re: EventHandler in 0.8

Posted by Jun Rao <ju...@gmail.com>.
Chris,

Thanks for bringing up this part. Yes, in 0.8, we don't really support a
custom event handler. This is because (1) the producer send logic is a bit
more complicated since it has to issue metadata requests whenever leaders
change; (2) we are not aware of too many use cases of custom event handler.
So, we only made the serializer and partitioner customizable. We can
revisit this if there are good use cases of custom event handler. For your
use case, could you just do the batching/customized compression outside of
producer?

Thanks,

Jun




On Fri, Aug 2, 2013 at 8:26 AM, Chris Hogue <cs...@gmail.com> wrote:

> We're a heavy 0.7 user and are now digging into 0.8 for some new projects.
> One of the features we used in 0.7 appears to be different and not clearly
> supported in 0.8.
>
> We use the EventHandler plug-point in 0.7, specifically to do custom
> batching before the messages are actually sent to the broker. We were
> hoping to use this feature in 0.8, albeit for a slightly different reason.
> In testing 0.8 we found the broker had a pretty significant choke point
> with compression. I'm sure this is a known performance consideration, where
> using the built-in compression forces the broker to un-compress and
> re-compress the messages to assign offsets.
>
> We changed our test to batch and compress the messages before it went to
> the producer, leaving the compression.codec set to none. With this change
> we saw on the order of a 3x increase in throughput on the broker while
> still retaining compressed messages across the wire and on disk.
>
> This was good for the tests but in the app we are planning to use semantic
> partitioning, so pre-batching and compressing before going to the producer
> is trickier. The hope was to use the producer's batching and partition
> selection, just plugging into the event handler for the compression piece.
> While I see the DefaultEventHandler is there and potentially replaceable
> via the constructor, it appears to have taken on more functionality and
> would thus be more involved to replace.
>
> Combined with all produce requests across brokers now being on a single
> thread, this is appearing to be difficult to achieve.
>
> I can imagine why some of these changes are in place in light of the
> replication feature, but wanted to throw it out there (a) in case we missed
> something or (b) at least as a potential consideration in the client
> re-design project that is under way.
>
> I'd be happy to have any input or other ideas on how to accomplish this
> that we may have missed. We'll look at other ways to tackle this and follow
> the other threads about the new clients in the meantime.
>
> Thanks.
>
> -Chris
>