You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Bob Potter <bo...@gmail.com> on 2014/05/19 23:40:14 UTC

Consistent replication of an event stream into Kafka

Hello,

We have a use case where we want to replicate an event stream which exists
outside of kafka into a kafka topic (single partition). The event stream
has sequence ids which always increase by 1. We want to preserve this
ordering.

The difficulty is that we want to be able to have the process that writes
these events automatically fail-over if it dies. While ZooKeeper can
guarantee a single writer at a given point in time we are worried about
delayed network packets, bugs and long GC pauses.

One solution we've thought of is to set the sequence_id as the key for the
Kafka messages and have a proxy running on each Kafka broker which refuses
to write new messages if they don't have the next expected key. This seems
to solve any issue we would have with badly behaving networks or processes.

Is there a better solution? Should we just handle these inconsistencies in
our consumers? Are we being too paranoid?

As a side-note, it seems like this functionality (guaranteeing that all
keys in a partition are in sequence on a particular topic) may be a nice
option to have in Kafka proper.

Thanks,
Bob

Re: Consistent replication of an event stream into Kafka

Posted by Guozhang Wang <wa...@gmail.com>.
We plan to work on the feature this summer, and make it available in the
0.9 release. Please try it out then and give us any feedbacks you have.

Guozhang


On Tue, May 20, 2014 at 9:23 AM, Bob Potter <bo...@gmail.com> wrote:

> Hi Guozhang,
>
> That looks great! I think it would solve our case.
>
> Thanks,
> Bob
>
>
> On 20 May 2014 00:18, Guozhang Wang <wa...@gmail.com> wrote:
>
> > Hello Bob,
> >
> > What you described is similar to the idempotent producer design that we
> are
> > now discussing about:
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer
> >
> > Do you think this new feature will solve your case?
> >
> > Guozhang
> >
> >
> > On Mon, May 19, 2014 at 2:40 PM, Bob Potter <bo...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > We have a use case where we want to replicate an event stream which
> > exists
> > > outside of kafka into a kafka topic (single partition). The event
> stream
> > > has sequence ids which always increase by 1. We want to preserve this
> > > ordering.
> > >
> > > The difficulty is that we want to be able to have the process that
> writes
> > > these events automatically fail-over if it dies. While ZooKeeper can
> > > guarantee a single writer at a given point in time we are worried about
> > > delayed network packets, bugs and long GC pauses.
> > >
> > > One solution we've thought of is to set the sequence_id as the key for
> > the
> > > Kafka messages and have a proxy running on each Kafka broker which
> > refuses
> > > to write new messages if they don't have the next expected key. This
> > seems
> > > to solve any issue we would have with badly behaving networks or
> > processes.
> > >
> > > Is there a better solution? Should we just handle these inconsistencies
> > in
> > > our consumers? Are we being too paranoid?
> > >
> > > As a side-note, it seems like this functionality (guaranteeing that all
> > > keys in a partition are in sequence on a particular topic) may be a
> nice
> > > option to have in Kafka proper.
> > >
> > > Thanks,
> > > Bob
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>
>
>
> --
> Bob Potter
>



-- 
-- Guozhang

Re: Consistent replication of an event stream into Kafka

Posted by Bob Potter <bo...@gmail.com>.
Hi Guozhang,

That looks great! I think it would solve our case.

Thanks,
Bob


On 20 May 2014 00:18, Guozhang Wang <wa...@gmail.com> wrote:

> Hello Bob,
>
> What you described is similar to the idempotent producer design that we are
> now discussing about:
>
> https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer
>
> Do you think this new feature will solve your case?
>
> Guozhang
>
>
> On Mon, May 19, 2014 at 2:40 PM, Bob Potter <bo...@gmail.com>
> wrote:
>
> > Hello,
> >
> > We have a use case where we want to replicate an event stream which
> exists
> > outside of kafka into a kafka topic (single partition). The event stream
> > has sequence ids which always increase by 1. We want to preserve this
> > ordering.
> >
> > The difficulty is that we want to be able to have the process that writes
> > these events automatically fail-over if it dies. While ZooKeeper can
> > guarantee a single writer at a given point in time we are worried about
> > delayed network packets, bugs and long GC pauses.
> >
> > One solution we've thought of is to set the sequence_id as the key for
> the
> > Kafka messages and have a proxy running on each Kafka broker which
> refuses
> > to write new messages if they don't have the next expected key. This
> seems
> > to solve any issue we would have with badly behaving networks or
> processes.
> >
> > Is there a better solution? Should we just handle these inconsistencies
> in
> > our consumers? Are we being too paranoid?
> >
> > As a side-note, it seems like this functionality (guaranteeing that all
> > keys in a partition are in sequence on a particular topic) may be a nice
> > option to have in Kafka proper.
> >
> > Thanks,
> > Bob
> >
>
>
>
> --
> -- Guozhang
>



-- 
Bob Potter

Re: Consistent replication of an event stream into Kafka

Posted by Guozhang Wang <wa...@gmail.com>.
Hello Bob,

What you described is similar to the idempotent producer design that we are
now discussing about:

https://cwiki.apache.org/confluence/display/KAFKA/Idempotent+Producer

Do you think this new feature will solve your case?

Guozhang


On Mon, May 19, 2014 at 2:40 PM, Bob Potter <bo...@gmail.com> wrote:

> Hello,
>
> We have a use case where we want to replicate an event stream which exists
> outside of kafka into a kafka topic (single partition). The event stream
> has sequence ids which always increase by 1. We want to preserve this
> ordering.
>
> The difficulty is that we want to be able to have the process that writes
> these events automatically fail-over if it dies. While ZooKeeper can
> guarantee a single writer at a given point in time we are worried about
> delayed network packets, bugs and long GC pauses.
>
> One solution we've thought of is to set the sequence_id as the key for the
> Kafka messages and have a proxy running on each Kafka broker which refuses
> to write new messages if they don't have the next expected key. This seems
> to solve any issue we would have with badly behaving networks or processes.
>
> Is there a better solution? Should we just handle these inconsistencies in
> our consumers? Are we being too paranoid?
>
> As a side-note, it seems like this functionality (guaranteeing that all
> keys in a partition are in sequence on a particular topic) may be a nice
> option to have in Kafka proper.
>
> Thanks,
> Bob
>



-- 
-- Guozhang