You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Nitin Supekar <ni...@ecinity.com> on 2013/07/02 19:41:31 UTC

CallbackHandler in Kafka 0.8

Hello-

   Is CallbackHandler supported in Kafka 0.8 for async producers?

If yes, can I use it to alter the batched messages before they are pushed
to broker? For example, I may want to delete some of the messages in the
batch based on some business logic in my application?

If no, is there any alternate way? I want to do some kind of single
instancing on messages pushed in kafka in last X minutes.

thanks

Re: CallbackHandler in Kafka 0.8

Posted by Nitin Supekar <ni...@ecinity.com>.
Thanks Joel for the response. Sure, I can achieve same by using some data
structure before we push messages to Kafka.
I was hoping if I can utilize any kafka capabilities specially as we are
using async producer.

thanks


On Wed, Jul 3, 2013 at 11:18 PM, Joel Koshy <jj...@gmail.com> wrote:

> Hi Nitin,
>
>
> > We receive events from external source ( for example, facebook status
> > update events). These events are pushed to kafka queue when received
> There
> > is possibility of duplicate event ( multiple facebook status update
> events
> > for same account  in quick intervals ) coming again and gets pushed into
> > kafka queue. At consumer end, we do not want to process duplicate events
> (
> > connect to facebook and fetch the status). We would prefer not to have
> > another data structure to  single instance the events received. There is
> > time limit in which whatever events received, we want to do single
> > instancing ( at once fetch all the status update events received in five
> > minutes for single account ). In async producer, events are anyways not
> > written to broker synchronously. They are batched and then pushed after
> > predefined time interval. That's perfect for us. We just want to look at
> > the batch and delete duplicate events from it and then push it broker.
>
> However, this depends on your event rate - i.e., if it's a very high
> event rate then the batch threshold is reached and send happens before
> that time interval. Furthermore, even if event rate is low, the
> de-duplication applies to a time-window of at most
> queue.buffering.max.ms).
>
> In 0.7, batches would be taken and the call-back handler's
> beforeSendingData method would be invoked on those batches. i.e., you
> can achieve the same effect as you did in 0.7 by keeping keeping an
> additional buffer (before the actual send) of
> queue.buffering.max.messages (i.e., the batch size) and de-duping
> within that buffer.
>
> Thanks,
>
> Joel
>
> > On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jj...@gmail.com> wrote:
> >
> >> Callback handlers are no longer supported in 0.8. Can you go into why
> >> the filtering needs to be done at this stage as opposed to before
> >> actually sending to the producer?
> >>
> >> Thanks,
> >>
> >> Joel
> >>
> >> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com>
> wrote:
> >> > Hello-
> >> >
> >> >    Is CallbackHandler supported in Kafka 0.8 for async producers?
> >> >
> >> > If yes, can I use it to alter the batched messages before they are
> pushed
> >> > to broker? For example, I may want to delete some of the messages in
> the
> >> > batch based on some business logic in my application?
> >> >
> >> > If no, is there any alternate way? I want to do some kind of single
> >> > instancing on messages pushed in kafka in last X minutes.
> >> >
> >> > thanks
> >>
>
>
> On Wed, Jul 3, 2013 at 1:33 AM, Nitin Supekar <ni...@ecinity.com> wrote:
> > Hello-
> >
> > We receive events from external source ( for example, facebook status
> > update events). These events are pushed to kafka queue when received
> There
> > is possibility of duplicate event ( multiple facebook status update
> events
> > for same account  in quick intervals ) coming again and gets pushed into
> > kafka queue. At consumer end, we do not want to process duplicate events
> (
> > connect to facebook and fetch the status). We would prefer not to have
> > another data structure to  single instance the events received. There is
> > time limit in which whatever events received, we want to do single
> > instancing ( at once fetch all the status update events received in five
> > minutes for single account ). In async producer, events are anyways not
> > written to broker synchronously. They are batched and then pushed after
> > predefined time interval. That's perfect for us. We just want to look at
> > the batch and delete duplicate events from it and then push it broker.
> >
> > Possible?
> >
> > thanks
> >
> >
> > On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jj...@gmail.com> wrote:
> >
> >> Callback handlers are no longer supported in 0.8. Can you go into why
> >> the filtering needs to be done at this stage as opposed to before
> >> actually sending to the producer?
> >>
> >> Thanks,
> >>
> >> Joel
> >>
> >> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com>
> wrote:
> >> > Hello-
> >> >
> >> >    Is CallbackHandler supported in Kafka 0.8 for async producers?
> >> >
> >> > If yes, can I use it to alter the batched messages before they are
> pushed
> >> > to broker? For example, I may want to delete some of the messages in
> the
> >> > batch based on some business logic in my application?
> >> >
> >> > If no, is there any alternate way? I want to do some kind of single
> >> > instancing on messages pushed in kafka in last X minutes.
> >> >
> >> > thanks
> >>
>

Re: CallbackHandler in Kafka 0.8

Posted by Joel Koshy <jj...@gmail.com>.
Hi Nitin,


> We receive events from external source ( for example, facebook status
> update events). These events are pushed to kafka queue when received There
> is possibility of duplicate event ( multiple facebook status update events
> for same account  in quick intervals ) coming again and gets pushed into
> kafka queue. At consumer end, we do not want to process duplicate events (
> connect to facebook and fetch the status). We would prefer not to have
> another data structure to  single instance the events received. There is
> time limit in which whatever events received, we want to do single
> instancing ( at once fetch all the status update events received in five
> minutes for single account ). In async producer, events are anyways not
> written to broker synchronously. They are batched and then pushed after
> predefined time interval. That's perfect for us. We just want to look at
> the batch and delete duplicate events from it and then push it broker.

However, this depends on your event rate - i.e., if it's a very high
event rate then the batch threshold is reached and send happens before
that time interval. Furthermore, even if event rate is low, the
de-duplication applies to a time-window of at most
queue.buffering.max.ms).

In 0.7, batches would be taken and the call-back handler's
beforeSendingData method would be invoked on those batches. i.e., you
can achieve the same effect as you did in 0.7 by keeping keeping an
additional buffer (before the actual send) of
queue.buffering.max.messages (i.e., the batch size) and de-duping
within that buffer.

Thanks,

Joel

> On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jj...@gmail.com> wrote:
>
>> Callback handlers are no longer supported in 0.8. Can you go into why
>> the filtering needs to be done at this stage as opposed to before
>> actually sending to the producer?
>>
>> Thanks,
>>
>> Joel
>>
>> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> wrote:
>> > Hello-
>> >
>> >    Is CallbackHandler supported in Kafka 0.8 for async producers?
>> >
>> > If yes, can I use it to alter the batched messages before they are pushed
>> > to broker? For example, I may want to delete some of the messages in the
>> > batch based on some business logic in my application?
>> >
>> > If no, is there any alternate way? I want to do some kind of single
>> > instancing on messages pushed in kafka in last X minutes.
>> >
>> > thanks
>>


On Wed, Jul 3, 2013 at 1:33 AM, Nitin Supekar <ni...@ecinity.com> wrote:
> Hello-
>
> We receive events from external source ( for example, facebook status
> update events). These events are pushed to kafka queue when received There
> is possibility of duplicate event ( multiple facebook status update events
> for same account  in quick intervals ) coming again and gets pushed into
> kafka queue. At consumer end, we do not want to process duplicate events (
> connect to facebook and fetch the status). We would prefer not to have
> another data structure to  single instance the events received. There is
> time limit in which whatever events received, we want to do single
> instancing ( at once fetch all the status update events received in five
> minutes for single account ). In async producer, events are anyways not
> written to broker synchronously. They are batched and then pushed after
> predefined time interval. That's perfect for us. We just want to look at
> the batch and delete duplicate events from it and then push it broker.
>
> Possible?
>
> thanks
>
>
> On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jj...@gmail.com> wrote:
>
>> Callback handlers are no longer supported in 0.8. Can you go into why
>> the filtering needs to be done at this stage as opposed to before
>> actually sending to the producer?
>>
>> Thanks,
>>
>> Joel
>>
>> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> wrote:
>> > Hello-
>> >
>> >    Is CallbackHandler supported in Kafka 0.8 for async producers?
>> >
>> > If yes, can I use it to alter the batched messages before they are pushed
>> > to broker? For example, I may want to delete some of the messages in the
>> > batch based on some business logic in my application?
>> >
>> > If no, is there any alternate way? I want to do some kind of single
>> > instancing on messages pushed in kafka in last X minutes.
>> >
>> > thanks
>>

Re: CallbackHandler in Kafka 0.8

Posted by Nitin Supekar <ni...@ecinity.com>.
Hello-

We receive events from external source ( for example, facebook status
update events). These events are pushed to kafka queue when received There
is possibility of duplicate event ( multiple facebook status update events
for same account  in quick intervals ) coming again and gets pushed into
kafka queue. At consumer end, we do not want to process duplicate events (
connect to facebook and fetch the status). We would prefer not to have
another data structure to  single instance the events received. There is
time limit in which whatever events received, we want to do single
instancing ( at once fetch all the status update events received in five
minutes for single account ). In async producer, events are anyways not
written to broker synchronously. They are batched and then pushed after
predefined time interval. That's perfect for us. We just want to look at
the batch and delete duplicate events from it and then push it broker.

Possible?

thanks


On Wed, Jul 3, 2013 at 3:07 AM, Joel Koshy <jj...@gmail.com> wrote:

> Callback handlers are no longer supported in 0.8. Can you go into why
> the filtering needs to be done at this stage as opposed to before
> actually sending to the producer?
>
> Thanks,
>
> Joel
>
> On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> wrote:
> > Hello-
> >
> >    Is CallbackHandler supported in Kafka 0.8 for async producers?
> >
> > If yes, can I use it to alter the batched messages before they are pushed
> > to broker? For example, I may want to delete some of the messages in the
> > batch based on some business logic in my application?
> >
> > If no, is there any alternate way? I want to do some kind of single
> > instancing on messages pushed in kafka in last X minutes.
> >
> > thanks
>

Re: CallbackHandler in Kafka 0.8

Posted by Joel Koshy <jj...@gmail.com>.
Callback handlers are no longer supported in 0.8. Can you go into why
the filtering needs to be done at this stage as opposed to before
actually sending to the producer?

Thanks,

Joel

On Tue, Jul 2, 2013 at 10:41 AM, Nitin Supekar <ni...@ecinity.com> wrote:
> Hello-
>
>    Is CallbackHandler supported in Kafka 0.8 for async producers?
>
> If yes, can I use it to alter the batched messages before they are pushed
> to broker? For example, I may want to delete some of the messages in the
> batch based on some business logic in my application?
>
> If no, is there any alternate way? I want to do some kind of single
> instancing on messages pushed in kafka in last X minutes.
>
> thanks