You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@distributedlog.apache.org by Sijie Guo <si...@apache.org> on 2016/08/01 06:46:38 UTC

Re: Distributed Log as Kafka's backend

Khurrum,

Interesting. Thank you for your interests in DistributedLog.

Three years ago when we started the project internally at Twitter, we did
have a plan to use it as a backend for both kestrel (Twitter's in-house
queue system) and Kafka. However, we didn't go down that direction.
Instead, we built a similar self-serve pub/sub system over DistributedLog
to consolidate our kestrel and kafka. So we don't have a concrete plan to
build the kafka's interface over DistributedLog. The module was put under
tutorials is mostly to give people an idea how it can be used for building
a partition based pub/sub system.

However, I don't have any strong preference here. If you think it would be
useful to other people, you are welcome to contribute. We'd be happy to
guide and offer any helps.

Also, it might be good if you can explain more about what you are planning
to do. Other people in the community can chime in and discuss.

Please let us know your thoughts. You are very welcome to make any
contributions.

- Sijie

On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <kh...@gmail.com>
wrote:

> Hello folks,
>
> I saw there is a 'distributedlog-kafka' module in tutorials. But it seems
> not complete yet. I am wondering if there is a plan to fully implement the
> kafka's interface. It would be great if we can use kafka's interface to
> access distributed log. I'd like to contribute if there is a plan.
>
> Thanks,
> KN
>

Re: Distributed Log as Kafka's backend

Posted by Khurrum Nasim <kh...@gmail.com>.

Thanks Leigh and Sijie.

I will move the development to under a contrib project. I am going to also
talk the kafka folks if that is the best place to host this idea.

KN

On Mon, Aug 29, 2016 at 7:45 AM, Leigh Stewart <lstewart@twitter.com.invalid
> wrote:

> Agree with Sijie, I think this is exciting work and I didn't mean to cut
> off your options. My objection was just about code organization.
>
> A contribs project seems like a good compromise for now, until we can think
> of a better place to put the code.
>
> Sijie's right though, if we want to fully productionize this and make it
> reusable this might not be the best long term location.
>
> What are your thoughts Khurrum? Does the code organization/ layering
> argument make sense?
>
> Thanks!
>
> On Fri, Aug 26, 2016 at 8:24 PM, Sijie Guo <si...@apache.org> wrote:
>
> > + Leigh
> >
> > Khurrum,
> >
> > Thanks for your hard working on this. The approach in general looks good
> > to me.
> >
> > However, I am kind of agreeing with what Leigh commented at pull request.
> > Ideally we want to make DL more focus on single streams itself, such as
> > durability, consistency and performance. As different applications might
> > use streams in a different way to produce different data/consumption
> > models. For example, you can use a set of streams to build Kafka-like
> > partitioned pubsub, or other people can use a set of streams to build a
> > queue-like messaging system, or build database.
> >
> > However, at the other side, it is very interesting to see a good Kafka
> > client integration using DL streams as partitions rather than just a
> > non-completed tutorial. I wouldn't discourage your hard working.
> Probably a
> > tradeoff here is making a distributdlog-contribs module and moving the
> > distributedlog-kafka module to under it. The distributedlog-contribs
> module
> > hosts any integration related contributions. This would helping avoid any
> > confusions. Any thoughts, Leigh?
> >
> > Also, Khurrum, did you talk with Kafka community? I am not sure if DL is
> > the right repo to host this. Does anyone else have better suggestions on
> > this?
> >
> > - Sijie
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Thursday, August 25, 2016, Khurrum Nasim <kh...@gmail.com>
> > wrote:
> >
> >> I sent out another pull request to improve the kafka publisher in the
> >> tutorial : https://github.com/apache/incubator-distributedlog/pull/16
> >>
> >> We tried to use the existing kafka configuration, key/value serializer
> and
> >> partitioner as possible as we can. So we don't need to rewrite our
> >> existing
> >> services to adopt distributedlog.
> >>
> >> Although the pull request is still WIP, we'd like to know if we are
> using
> >> distributed log in the right way. Especially we are thinking of changing
> >> write proxy to also return either transaction id or sequence id on write
> >> requests.
> >>
> >> Appreciate your helps.
> >>
> >> - KN
> >>
> >>
> >>
> >> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <khurrumnasimm@gmail.com
> >
> >> wrote:
> >>
> >> > I sent out a pull request about the offset sequencer.
> >> https://github.com/
> >> > apache/incubator-distributedlog/pull/15
> >> >
> >> > I am not sure if there is any code guideline to follow. I tried my
> best
> >> to
> >> > follow existing code style. If I did anything wrong, please help me
> fix
> >> > them.
> >> >
> >> > - KN
> >> >
> >> >
> >> >
> >> >
> >> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <
> khurrumnasimm@gmail.com
> >> >
> >> > wrote:
> >> >
> >> >> Hi All,
> >> >>
> >> >> After read the DL code, we have a better idea on how to use
> distributed
> >> >> log as the kafka implementation. There are two approaches to do that:
> >> one
> >> >> is to use distributedlog-core library directly in kafka broker, while
> >> the
> >> >> other one is to use all the DL components.
> >> >>
> >> >> The first approach is basically to replace the storage of kafka
> broker
> >> >> with bookkeeper. The good part is that all the kafka wire-protocols
> >> will
> >> >> remain unchanged. But it might take longer time and also make
> >> operations
> >> >> complicated.
> >> >>
> >> >> The second approach is to implement Kafka's publisher and subscriber
> >> API
> >> >> using DL. It would be much faster and more consistent on operations
> (we
> >> >> only need to operate DL backend only). However, it would only support
> >> java
> >> >> client.
> >> >>
> >> >> We discussed internally. We felt the second approach is good enough
> to
> >> us
> >> >> and it is easier to achieve. We will start with the second approach.
> If
> >> >> there are anyone interested in first approach, we'd like to
> >> participant and
> >> >> help too.
> >> >>
> >> >> Here is the outline about our changes:
> >> >>
> >> >> * Kafka Namespace: as I replied in the other email thread, we want to
> >> >> layout the streams in following format:
> >> >>
> >> >> namespace/topic/partitions : storing all the partitions
> >> >> namespace/topic/partitions/N : storing the given partition `N`
> >> >> namespace/topic/subscriptions : storing all the subscriptions
> >> >> namespace/topic/subscriptions/S : storing the information of
> >> >> subscription `S`
> >> >>
> >> >> both `namespace/topic/partitions/N` and
> `namespace/topic/subscriptions
> >> /S`
> >> >> are DL streams.
> >> >>
> >> >> * Offset Sequencer: we want to assign `offset` as the transaction id
> >> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow
> write
> >> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
> >> >>
> >> >> * Use separated DL streams to store the information of a
> subscription,
> >> >> such as offsets and consumer load balancing information.
> >> >>
> >> >> Do you see any concerns here?
> >> >>
> >> >>
> >> >> - KN
> >> >>
> >> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
> >> >>
> >> >>> Thanks Khurrum.
> >> >>>
> >> >>> At this point, we don't have any specific process to follow for big
> >> >>> features. We were discussing one under
> >> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
> >> >>> edlog-dev/201607.mbox/browser
> >> >>>
> >> >>> But ideally, let's use mail list for discussion and use confluence
> >> page
> >> >>> for
> >> >>> reflecting the discussions into a design doc.
> >> >>>
> >> >>> If you already have a confluence account (if not, please create
> one),
> >> >>> please email me your account. I can grant the permission to you,
> then
> >> you
> >> >>> can edit.
> >> >>>
> >> >>> - Sijie
> >> >>>
> >> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <
> >> khurrumnasimm@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>> > Sijie,
> >> >>> >
> >> >>> > Thank you so much for your quick reply. We are using Kafka now and
> >> we
> >> >>> are
> >> >>> > interested in the features in DL like durability and handling slow
> >> >>> > machines.
> >> >>> >
> >> >>> > If it is okay to the community, we'd like to give a try and
> evaluate
> >> >>> the
> >> >>> > solution. Is there any process that I should follow?
> >> >>> >
> >> >>> > KN
> >> >>> >
> >> >>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
> >> >>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
> >> >>> >
> >> >>> > > Khurrum,
> >> >>> > >
> >> >>> > > Interesting. Thank you for your interests in DistributedLog.
> >> >>> > >
> >> >>> > > Three years ago when we started the project internally at
> Twitter,
> >> >>> we did
> >> >>> > > have a plan to use it as a backend for both kestrel (Twitter's
> >> >>> in-house
> >> >>> > > queue system) and Kafka. However, we didn't go down that
> >> direction.
> >> >>> > > Instead, we built a similar self-serve pub/sub system over
> >> >>> DistributedLog
> >> >>> > > to consolidate our kestrel and kafka. So we don't have a
> concrete
> >> >>> plan to
> >> >>> > > build the kafka's interface over DistributedLog. The module was
> >> put
> >> >>> under
> >> >>> > > tutorials is mostly to give people an idea how it can be used
> for
> >> >>> > building
> >> >>> > > a partition based pub/sub system.
> >> >>> > >
> >> >>> > > However, I don't have any strong preference here. If you think
> it
> >> >>> would
> >> >>> > be
> >> >>> > > useful to other people, you are welcome to contribute. We'd be
> >> happy
> >> >>> to
> >> >>> > > guide and offer any helps.
> >> >>> > >
> >> >>> > > Also, it might be good if you can explain more about what you
> are
> >> >>> > planning
> >> >>> > > to do. Other people in the community can chime in and discuss.
> >> >>> > >
> >> >>> > > Please let us know your thoughts. You are very welcome to make
> any
> >> >>> > > contributions.
> >> >>> > >
> >> >>> > > - Sijie
> >> >>> > >
> >> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
> >> >>> khurrumnasimm@gmail.com
> >> >>> > >
> >> >>> > > wrote:
> >> >>> > >
> >> >>> > > > Hello folks,
> >> >>> > > >
> >> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials.
> >> But it
> >> >>> > seems
> >> >>> > > > not complete yet. I am wondering if there is a plan to fully
> >> >>> implement
> >> >>> > > the
> >> >>> > > > kafka's interface. It would be great if we can use kafka's
> >> >>> interface to
> >> >>> > > > access distributed log. I'd like to contribute if there is a
> >> plan.
> >> >>> > > >
> >> >>> > > > Thanks,
> >> >>> > > > KN
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >> >
> >>
> >
>

Re: Distributed Log as Kafka's backend

Posted by Leigh Stewart <ls...@twitter.com.INVALID>.

Agree with Sijie, I think this is exciting work and I didn't mean to cut
off your options. My objection was just about code organization.

A contribs project seems like a good compromise for now, until we can think
of a better place to put the code.

Sijie's right though, if we want to fully productionize this and make it
reusable this might not be the best long term location.

What are your thoughts Khurrum? Does the code organization/ layering
argument make sense?

Thanks!

On Fri, Aug 26, 2016 at 8:24 PM, Sijie Guo <si...@apache.org> wrote:

> + Leigh
>
> Khurrum,
>
> Thanks for your hard working on this. The approach in general looks good
> to me.
>
> However, I am kind of agreeing with what Leigh commented at pull request.
> Ideally we want to make DL more focus on single streams itself, such as
> durability, consistency and performance. As different applications might
> use streams in a different way to produce different data/consumption
> models. For example, you can use a set of streams to build Kafka-like
> partitioned pubsub, or other people can use a set of streams to build a
> queue-like messaging system, or build database.
>
> However, at the other side, it is very interesting to see a good Kafka
> client integration using DL streams as partitions rather than just a
> non-completed tutorial. I wouldn't discourage your hard working. Probably a
> tradeoff here is making a distributdlog-contribs module and moving the
> distributedlog-kafka module to under it. The distributedlog-contribs module
> hosts any integration related contributions. This would helping avoid any
> confusions. Any thoughts, Leigh?
>
> Also, Khurrum, did you talk with Kafka community? I am not sure if DL is
> the right repo to host this. Does anyone else have better suggestions on
> this?
>
> - Sijie
>
>
>
>
>
>
>
>
>
> On Thursday, August 25, 2016, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
>> I sent out another pull request to improve the kafka publisher in the
>> tutorial : https://github.com/apache/incubator-distributedlog/pull/16
>>
>> We tried to use the existing kafka configuration, key/value serializer and
>> partitioner as possible as we can. So we don't need to rewrite our
>> existing
>> services to adopt distributedlog.
>>
>> Although the pull request is still WIP, we'd like to know if we are using
>> distributed log in the right way. Especially we are thinking of changing
>> write proxy to also return either transaction id or sequence id on write
>> requests.
>>
>> Appreciate your helps.
>>
>> - KN
>>
>>
>>
>> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <kh...@gmail.com>
>> wrote:
>>
>> > I sent out a pull request about the offset sequencer.
>> https://github.com/
>> > apache/incubator-distributedlog/pull/15
>> >
>> > I am not sure if there is any code guideline to follow. I tried my best
>> to
>> > follow existing code style. If I did anything wrong, please help me fix
>> > them.
>> >
>> > - KN
>> >
>> >
>> >
>> >
>> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <khurrumnasimm@gmail.com
>> >
>> > wrote:
>> >
>> >> Hi All,
>> >>
>> >> After read the DL code, we have a better idea on how to use distributed
>> >> log as the kafka implementation. There are two approaches to do that:
>> one
>> >> is to use distributedlog-core library directly in kafka broker, while
>> the
>> >> other one is to use all the DL components.
>> >>
>> >> The first approach is basically to replace the storage of kafka broker
>> >> with bookkeeper. The good part is that all the kafka wire-protocols
>> will
>> >> remain unchanged. But it might take longer time and also make
>> operations
>> >> complicated.
>> >>
>> >> The second approach is to implement Kafka's publisher and subscriber
>> API
>> >> using DL. It would be much faster and more consistent on operations (we
>> >> only need to operate DL backend only). However, it would only support
>> java
>> >> client.
>> >>
>> >> We discussed internally. We felt the second approach is good enough to
>> us
>> >> and it is easier to achieve. We will start with the second approach. If
>> >> there are anyone interested in first approach, we'd like to
>> participant and
>> >> help too.
>> >>
>> >> Here is the outline about our changes:
>> >>
>> >> * Kafka Namespace: as I replied in the other email thread, we want to
>> >> layout the streams in following format:
>> >>
>> >> namespace/topic/partitions : storing all the partitions
>> >> namespace/topic/partitions/N : storing the given partition `N`
>> >> namespace/topic/subscriptions : storing all the subscriptions
>> >> namespace/topic/subscriptions/S : storing the information of
>> >> subscription `S`
>> >>
>> >> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions
>> /S`
>> >> are DL streams.
>> >>
>> >> * Offset Sequencer: we want to assign `offset` as the transaction id
>> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
>> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>> >>
>> >> * Use separated DL streams to store the information of a subscription,
>> >> such as offsets and consumer load balancing information.
>> >>
>> >> Do you see any concerns here?
>> >>
>> >>
>> >> - KN
>> >>
>> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
>> >>
>> >>> Thanks Khurrum.
>> >>>
>> >>> At this point, we don't have any specific process to follow for big
>> >>> features. We were discussing one under
>> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>> >>> edlog-dev/201607.mbox/browser
>> >>>
>> >>> But ideally, let's use mail list for discussion and use confluence
>> page
>> >>> for
>> >>> reflecting the discussions into a design doc.
>> >>>
>> >>> If you already have a confluence account (if not, please create one),
>> >>> please email me your account. I can grant the permission to you, then
>> you
>> >>> can edit.
>> >>>
>> >>> - Sijie
>> >>>
>> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <
>> khurrumnasimm@gmail.com>
>> >>> wrote:
>> >>>
>> >>> > Sijie,
>> >>> >
>> >>> > Thank you so much for your quick reply. We are using Kafka now and
>> we
>> >>> are
>> >>> > interested in the features in DL like durability and handling slow
>> >>> > machines.
>> >>> >
>> >>> > If it is okay to the community, we'd like to give a try and evaluate
>> >>> the
>> >>> > solution. Is there any process that I should follow?
>> >>> >
>> >>> > KN
>> >>> >
>> >>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
>> >>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>> >>> >
>> >>> > > Khurrum,
>> >>> > >
>> >>> > > Interesting. Thank you for your interests in DistributedLog.
>> >>> > >
>> >>> > > Three years ago when we started the project internally at Twitter,
>> >>> we did
>> >>> > > have a plan to use it as a backend for both kestrel (Twitter's
>> >>> in-house
>> >>> > > queue system) and Kafka. However, we didn't go down that
>> direction.
>> >>> > > Instead, we built a similar self-serve pub/sub system over
>> >>> DistributedLog
>> >>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>> >>> plan to
>> >>> > > build the kafka's interface over DistributedLog. The module was
>> put
>> >>> under
>> >>> > > tutorials is mostly to give people an idea how it can be used for
>> >>> > building
>> >>> > > a partition based pub/sub system.
>> >>> > >
>> >>> > > However, I don't have any strong preference here. If you think it
>> >>> would
>> >>> > be
>> >>> > > useful to other people, you are welcome to contribute. We'd be
>> happy
>> >>> to
>> >>> > > guide and offer any helps.
>> >>> > >
>> >>> > > Also, it might be good if you can explain more about what you are
>> >>> > planning
>> >>> > > to do. Other people in the community can chime in and discuss.
>> >>> > >
>> >>> > > Please let us know your thoughts. You are very welcome to make any
>> >>> > > contributions.
>> >>> > >
>> >>> > > - Sijie
>> >>> > >
>> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>> >>> khurrumnasimm@gmail.com
>> >>> > >
>> >>> > > wrote:
>> >>> > >
>> >>> > > > Hello folks,
>> >>> > > >
>> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials.
>> But it
>> >>> > seems
>> >>> > > > not complete yet. I am wondering if there is a plan to fully
>> >>> implement
>> >>> > > the
>> >>> > > > kafka's interface. It would be great if we can use kafka's
>> >>> interface to
>> >>> > > > access distributed log. I'd like to contribute if there is a
>> plan.
>> >>> > > >
>> >>> > > > Thanks,
>> >>> > > > KN
>> >>> > > >
>> >>> > >
>> >>> >
>> >>>
>> >>
>> >>
>> >
>>
>

Re: Distributed Log as Kafka's backend

Posted by Sijie Guo <si...@apache.org>.

+ Leigh

Khurrum,

Thanks for your hard working on this. The approach in general looks good to
me.

However, I am kind of agreeing with what Leigh commented at pull request.
Ideally we want to make DL more focus on single streams itself, such as
durability, consistency and performance. As different applications might
use streams in a different way to produce different data/consumption
models. For example, you can use a set of streams to build Kafka-like
partitioned pubsub, or other people can use a set of streams to build a
queue-like messaging system, or build database.

However, at the other side, it is very interesting to see a good Kafka
client integration using DL streams as partitions rather than just a
non-completed tutorial. I wouldn't discourage your hard working. Probably a
tradeoff here is making a distributdlog-contribs module and moving the
distributedlog-kafka module to under it. The distributedlog-contribs module
hosts any integration related contributions. This would helping avoid any
confusions. Any thoughts, Leigh?

Also, Khurrum, did you talk with Kafka community? I am not sure if DL is
the right repo to host this. Does anyone else have better suggestions on
this?

- Sijie









On Thursday, August 25, 2016, Khurrum Nasim <kh...@gmail.com> wrote:

> I sent out another pull request to improve the kafka publisher in the
> tutorial : https://github.com/apache/incubator-distributedlog/pull/16
>
> We tried to use the existing kafka configuration, key/value serializer and
> partitioner as possible as we can. So we don't need to rewrite our existing
> services to adopt distributedlog.
>
> Although the pull request is still WIP, we'd like to know if we are using
> distributed log in the right way. Especially we are thinking of changing
> write proxy to also return either transaction id or sequence id on write
> requests.
>
> Appreciate your helps.
>
> - KN
>
>
>
> On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > I sent out a pull request about the offset sequencer.
> https://github.com/
> > apache/incubator-distributedlog/pull/15
> >
> > I am not sure if there is any code guideline to follow. I tried my best
> to
> > follow existing code style. If I did anything wrong, please help me fix
> > them.
> >
> > - KN
> >
> >
> >
> >
> > On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <kh...@gmail.com>
> > wrote:
> >
> >> Hi All,
> >>
> >> After read the DL code, we have a better idea on how to use distributed
> >> log as the kafka implementation. There are two approaches to do that:
> one
> >> is to use distributedlog-core library directly in kafka broker, while
> the
> >> other one is to use all the DL components.
> >>
> >> The first approach is basically to replace the storage of kafka broker
> >> with bookkeeper. The good part is that all the kafka wire-protocols will
> >> remain unchanged. But it might take longer time and also make operations
> >> complicated.
> >>
> >> The second approach is to implement Kafka's publisher and subscriber API
> >> using DL. It would be much faster and more consistent on operations (we
> >> only need to operate DL backend only). However, it would only support
> java
> >> client.
> >>
> >> We discussed internally. We felt the second approach is good enough to
> us
> >> and it is easier to achieve. We will start with the second approach. If
> >> there are anyone interested in first approach, we'd like to participant
> and
> >> help too.
> >>
> >> Here is the outline about our changes:
> >>
> >> * Kafka Namespace: as I replied in the other email thread, we want to
> >> layout the streams in following format:
> >>
> >> namespace/topic/partitions : storing all the partitions
> >> namespace/topic/partitions/N : storing the given partition `N`
> >> namespace/topic/subscriptions : storing all the subscriptions
> >> namespace/topic/subscriptions/S : storing the information of
> >> subscription `S`
> >>
> >> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions
> /S`
> >> are DL streams.
> >>
> >> * Offset Sequencer: we want to assign `offset` as the transaction id
> >> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
> >> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
> >>
> >> * Use separated DL streams to store the information of a subscription,
> >> such as offsets and consumer load balancing information.
> >>
> >> Do you see any concerns here?
> >>
> >>
> >> - KN
> >>
> >> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
> >>
> >>> Thanks Khurrum.
> >>>
> >>> At this point, we don't have any specific process to follow for big
> >>> features. We were discussing one under
> >>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
> >>> edlog-dev/201607.mbox/browser
> >>>
> >>> But ideally, let's use mail list for discussion and use confluence page
> >>> for
> >>> reflecting the discussions into a design doc.
> >>>
> >>> If you already have a confluence account (if not, please create one),
> >>> please email me your account. I can grant the permission to you, then
> you
> >>> can edit.
> >>>
> >>> - Sijie
> >>>
> >>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <khurrumnasimm@gmail.com
> >
> >>> wrote:
> >>>
> >>> > Sijie,
> >>> >
> >>> > Thank you so much for your quick reply. We are using Kafka now and we
> >>> are
> >>> > interested in the features in DL like durability and handling slow
> >>> > machines.
> >>> >
> >>> > If it is okay to the community, we'd like to give a try and evaluate
> >>> the
> >>> > solution. Is there any process that I should follow?
> >>> >
> >>> > KN
> >>> >
> >>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
> >>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
> >>> >
> >>> > > Khurrum,
> >>> > >
> >>> > > Interesting. Thank you for your interests in DistributedLog.
> >>> > >
> >>> > > Three years ago when we started the project internally at Twitter,
> >>> we did
> >>> > > have a plan to use it as a backend for both kestrel (Twitter's
> >>> in-house
> >>> > > queue system) and Kafka. However, we didn't go down that direction.
> >>> > > Instead, we built a similar self-serve pub/sub system over
> >>> DistributedLog
> >>> > > to consolidate our kestrel and kafka. So we don't have a concrete
> >>> plan to
> >>> > > build the kafka's interface over DistributedLog. The module was put
> >>> under
> >>> > > tutorials is mostly to give people an idea how it can be used for
> >>> > building
> >>> > > a partition based pub/sub system.
> >>> > >
> >>> > > However, I don't have any strong preference here. If you think it
> >>> would
> >>> > be
> >>> > > useful to other people, you are welcome to contribute. We'd be
> happy
> >>> to
> >>> > > guide and offer any helps.
> >>> > >
> >>> > > Also, it might be good if you can explain more about what you are
> >>> > planning
> >>> > > to do. Other people in the community can chime in and discuss.
> >>> > >
> >>> > > Please let us know your thoughts. You are very welcome to make any
> >>> > > contributions.
> >>> > >
> >>> > > - Sijie
> >>> > >
> >>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
> >>> khurrumnasimm@gmail.com
> >>> > >
> >>> > > wrote:
> >>> > >
> >>> > > > Hello folks,
> >>> > > >
> >>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But
> it
> >>> > seems
> >>> > > > not complete yet. I am wondering if there is a plan to fully
> >>> implement
> >>> > > the
> >>> > > > kafka's interface. It would be great if we can use kafka's
> >>> interface to
> >>> > > > access distributed log. I'd like to contribute if there is a
> plan.
> >>> > > >
> >>> > > > Thanks,
> >>> > > > KN
> >>> > > >
> >>> > >
> >>> >
> >>>
> >>
> >>
> >
>

Re: Distributed Log as Kafka's backend

Posted by Khurrum Nasim <kh...@gmail.com>.

I sent out another pull request to improve the kafka publisher in the
tutorial : https://github.com/apache/incubator-distributedlog/pull/16

We tried to use the existing kafka configuration, key/value serializer and
partitioner as possible as we can. So we don't need to rewrite our existing
services to adopt distributedlog.

Although the pull request is still WIP, we'd like to know if we are using
distributed log in the right way. Especially we are thinking of changing
write proxy to also return either transaction id or sequence id on write
requests.

Appreciate your helps.

- KN



On Thu, Aug 25, 2016 at 1:28 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> I sent out a pull request about the offset sequencer. https://github.com/
> apache/incubator-distributedlog/pull/15
>
> I am not sure if there is any code guideline to follow. I tried my best to
> follow existing code style. If I did anything wrong, please help me fix
> them.
>
> - KN
>
>
>
>
> On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> After read the DL code, we have a better idea on how to use distributed
>> log as the kafka implementation. There are two approaches to do that: one
>> is to use distributedlog-core library directly in kafka broker, while the
>> other one is to use all the DL components.
>>
>> The first approach is basically to replace the storage of kafka broker
>> with bookkeeper. The good part is that all the kafka wire-protocols will
>> remain unchanged. But it might take longer time and also make operations
>> complicated.
>>
>> The second approach is to implement Kafka's publisher and subscriber API
>> using DL. It would be much faster and more consistent on operations (we
>> only need to operate DL backend only). However, it would only support java
>> client.
>>
>> We discussed internally. We felt the second approach is good enough to us
>> and it is easier to achieve. We will start with the second approach. If
>> there are anyone interested in first approach, we'd like to participant and
>> help too.
>>
>> Here is the outline about our changes:
>>
>> * Kafka Namespace: as I replied in the other email thread, we want to
>> layout the streams in following format:
>>
>> namespace/topic/partitions : storing all the partitions
>> namespace/topic/partitions/N : storing the given partition `N`
>> namespace/topic/subscriptions : storing all the subscriptions
>> namespace/topic/subscriptions/S : storing the information of
>> subscription `S`
>>
>> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
>> are DL streams.
>>
>> * Offset Sequencer: we want to assign `offset` as the transaction id
>> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
>> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>>
>> * Use separated DL streams to store the information of a subscription,
>> such as offsets and consumer load balancing information.
>>
>> Do you see any concerns here?
>>
>>
>> - KN
>>
>> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
>>
>>> Thanks Khurrum.
>>>
>>> At this point, we don't have any specific process to follow for big
>>> features. We were discussing one under
>>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>>> edlog-dev/201607.mbox/browser
>>>
>>> But ideally, let's use mail list for discussion and use confluence page
>>> for
>>> reflecting the discussions into a design doc.
>>>
>>> If you already have a confluence account (if not, please create one),
>>> please email me your account. I can grant the permission to you, then you
>>> can edit.
>>>
>>> - Sijie
>>>
>>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <kh...@gmail.com>
>>> wrote:
>>>
>>> > Sijie,
>>> >
>>> > Thank you so much for your quick reply. We are using Kafka now and we
>>> are
>>> > interested in the features in DL like durability and handling slow
>>> > machines.
>>> >
>>> > If it is okay to the community, we'd like to give a try and evaluate
>>> the
>>> > solution. Is there any process that I should follow?
>>> >
>>> > KN
>>> >
>>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
>>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>>> >
>>> > > Khurrum,
>>> > >
>>> > > Interesting. Thank you for your interests in DistributedLog.
>>> > >
>>> > > Three years ago when we started the project internally at Twitter,
>>> we did
>>> > > have a plan to use it as a backend for both kestrel (Twitter's
>>> in-house
>>> > > queue system) and Kafka. However, we didn't go down that direction.
>>> > > Instead, we built a similar self-serve pub/sub system over
>>> DistributedLog
>>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>>> plan to
>>> > > build the kafka's interface over DistributedLog. The module was put
>>> under
>>> > > tutorials is mostly to give people an idea how it can be used for
>>> > building
>>> > > a partition based pub/sub system.
>>> > >
>>> > > However, I don't have any strong preference here. If you think it
>>> would
>>> > be
>>> > > useful to other people, you are welcome to contribute. We'd be happy
>>> to
>>> > > guide and offer any helps.
>>> > >
>>> > > Also, it might be good if you can explain more about what you are
>>> > planning
>>> > > to do. Other people in the community can chime in and discuss.
>>> > >
>>> > > Please let us know your thoughts. You are very welcome to make any
>>> > > contributions.
>>> > >
>>> > > - Sijie
>>> > >
>>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>>> khurrumnasimm@gmail.com
>>> > >
>>> > > wrote:
>>> > >
>>> > > > Hello folks,
>>> > > >
>>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
>>> > seems
>>> > > > not complete yet. I am wondering if there is a plan to fully
>>> implement
>>> > > the
>>> > > > kafka's interface. It would be great if we can use kafka's
>>> interface to
>>> > > > access distributed log. I'd like to contribute if there is a plan.
>>> > > >
>>> > > > Thanks,
>>> > > > KN
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Distributed Log as Kafka's backend

Posted by Khurrum Nasim <kh...@gmail.com>.

I sent out a pull request about the offset sequencer.
https://github.com/apache/incubator-distributedlog/pull/15

I am not sure if there is any code guideline to follow. I tried my best to
follow existing code style. If I did anything wrong, please help me fix
them.

- KN




On Tue, Aug 23, 2016 at 9:38 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> Hi All,
>
> After read the DL code, we have a better idea on how to use distributed
> log as the kafka implementation. There are two approaches to do that: one
> is to use distributedlog-core library directly in kafka broker, while the
> other one is to use all the DL components.
>
> The first approach is basically to replace the storage of kafka broker
> with bookkeeper. The good part is that all the kafka wire-protocols will
> remain unchanged. But it might take longer time and also make operations
> complicated.
>
> The second approach is to implement Kafka's publisher and subscriber API
> using DL. It would be much faster and more consistent on operations (we
> only need to operate DL backend only). However, it would only support java
> client.
>
> We discussed internally. We felt the second approach is good enough to us
> and it is easier to achieve. We will start with the second approach. If
> there are anyone interested in first approach, we'd like to participant and
> help too.
>
> Here is the outline about our changes:
>
> * Kafka Namespace: as I replied in the other email thread, we want to
> layout the streams in following format:
>
> namespace/topic/partitions : storing all the partitions
> namespace/topic/partitions/N : storing the given partition `N`
> namespace/topic/subscriptions : storing all the subscriptions
> namespace/topic/subscriptions/S : storing the information of subscription
> `S`
>
> both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
> are DL streams.
>
> * Offset Sequencer: we want to assign `offset` as the transaction id
> instead of `timestamp`. we will add a `OffsetSequencer` and allow write
> proxy to load `OffsetSequencer` instead of `TimeSequencer`.
>
> * Use separated DL streams to store the information of a subscription,
> such as offsets and consumer load balancing information.
>
> Do you see any concerns here?
>
>
> - KN
>
> On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:
>
>> Thanks Khurrum.
>>
>> At this point, we don't have any specific process to follow for big
>> features. We were discussing one under
>> http://mail-archives.apache.org/mod_mbox/incubator-distribut
>> edlog-dev/201607.mbox/browser
>>
>> But ideally, let's use mail list for discussion and use confluence page
>> for
>> reflecting the discussions into a design doc.
>>
>> If you already have a confluence account (if not, please create one),
>> please email me your account. I can grant the permission to you, then you
>> can edit.
>>
>> - Sijie
>>
>> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <kh...@gmail.com>
>> wrote:
>>
>> > Sijie,
>> >
>> > Thank you so much for your quick reply. We are using Kafka now and we
>> are
>> > interested in the features in DL like durability and handling slow
>> > machines.
>> >
>> > If it is okay to the community, we'd like to give a try and evaluate the
>> > solution. Is there any process that I should follow?
>> >
>> > KN
>> >
>> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
>> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>> >
>> > > Khurrum,
>> > >
>> > > Interesting. Thank you for your interests in DistributedLog.
>> > >
>> > > Three years ago when we started the project internally at Twitter, we
>> did
>> > > have a plan to use it as a backend for both kestrel (Twitter's
>> in-house
>> > > queue system) and Kafka. However, we didn't go down that direction.
>> > > Instead, we built a similar self-serve pub/sub system over
>> DistributedLog
>> > > to consolidate our kestrel and kafka. So we don't have a concrete
>> plan to
>> > > build the kafka's interface over DistributedLog. The module was put
>> under
>> > > tutorials is mostly to give people an idea how it can be used for
>> > building
>> > > a partition based pub/sub system.
>> > >
>> > > However, I don't have any strong preference here. If you think it
>> would
>> > be
>> > > useful to other people, you are welcome to contribute. We'd be happy
>> to
>> > > guide and offer any helps.
>> > >
>> > > Also, it might be good if you can explain more about what you are
>> > planning
>> > > to do. Other people in the community can chime in and discuss.
>> > >
>> > > Please let us know your thoughts. You are very welcome to make any
>> > > contributions.
>> > >
>> > > - Sijie
>> > >
>> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
>> khurrumnasimm@gmail.com
>> > >
>> > > wrote:
>> > >
>> > > > Hello folks,
>> > > >
>> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
>> > seems
>> > > > not complete yet. I am wondering if there is a plan to fully
>> implement
>> > > the
>> > > > kafka's interface. It would be great if we can use kafka's
>> interface to
>> > > > access distributed log. I'd like to contribute if there is a plan.
>> > > >
>> > > > Thanks,
>> > > > KN
>> > > >
>> > >
>> >
>>
>
>

Re: Distributed Log as Kafka's backend

Posted by Khurrum Nasim <kh...@gmail.com>.

Hi All,

After read the DL code, we have a better idea on how to use distributed log
as the kafka implementation. There are two approaches to do that: one is to
use distributedlog-core library directly in kafka broker, while the other
one is to use all the DL components.

The first approach is basically to replace the storage of kafka broker with
bookkeeper. The good part is that all the kafka wire-protocols will remain
unchanged. But it might take longer time and also make operations
complicated.

The second approach is to implement Kafka's publisher and subscriber API
using DL. It would be much faster and more consistent on operations (we
only need to operate DL backend only). However, it would only support java
client.

We discussed internally. We felt the second approach is good enough to us
and it is easier to achieve. We will start with the second approach. If
there are anyone interested in first approach, we'd like to participant and
help too.

Here is the outline about our changes:

* Kafka Namespace: as I replied in the other email thread, we want to
layout the streams in following format:

namespace/topic/partitions : storing all the partitions
namespace/topic/partitions/N : storing the given partition `N`
namespace/topic/subscriptions : storing all the subscriptions
namespace/topic/subscriptions/S : storing the information of subscription
`S`

both `namespace/topic/partitions/N` and `namespace/topic/subscriptions/S`
are DL streams.

* Offset Sequencer: we want to assign `offset` as the transaction id
instead of `timestamp`. we will add a `OffsetSequencer` and allow write
proxy to load `OffsetSequencer` instead of `TimeSequencer`.

* Use separated DL streams to store the information of a subscription, such
as offsets and consumer load balancing information.

Do you see any concerns here?

- KN

On Tue, Aug 9, 2016 at 1:04 PM, Sijie Guo <si...@apache.org> wrote:

> Thanks Khurrum.
>
> At this point, we don't have any specific process to follow for big
> features. We were discussing one under
> http://mail-archives.apache.org/mod_mbox/incubator-distribut
> edlog-dev/201607.mbox/browser
>
> But ideally, let's use mail list for discussion and use confluence page for
> reflecting the discussions into a design doc.
>
> If you already have a confluence account (if not, please create one),
> please email me your account. I can grant the permission to you, then you
> can edit.
>
> - Sijie
>
> On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > Sijie,
> >
> > Thank you so much for your quick reply. We are using Kafka now and we are
> > interested in the features in DL like durability and handling slow
> > machines.
> >
> > If it is okay to the community, we'd like to give a try and evaluate the
> > solution. Is there any process that I should follow?
> >
> > KN
> >
> > On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
> > <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
> >
> > > Khurrum,
> > >
> > > Interesting. Thank you for your interests in DistributedLog.
> > >
> > > Three years ago when we started the project internally at Twitter, we
> did
> > > have a plan to use it as a backend for both kestrel (Twitter's in-house
> > > queue system) and Kafka. However, we didn't go down that direction.
> > > Instead, we built a similar self-serve pub/sub system over
> DistributedLog
> > > to consolidate our kestrel and kafka. So we don't have a concrete plan
> to
> > > build the kafka's interface over DistributedLog. The module was put
> under
> > > tutorials is mostly to give people an idea how it can be used for
> > building
> > > a partition based pub/sub system.
> > >
> > > However, I don't have any strong preference here. If you think it would
> > be
> > > useful to other people, you are welcome to contribute. We'd be happy to
> > > guide and offer any helps.
> > >
> > > Also, it might be good if you can explain more about what you are
> > planning
> > > to do. Other people in the community can chime in and discuss.
> > >
> > > Please let us know your thoughts. You are very welcome to make any
> > > contributions.
> > >
> > > - Sijie
> > >
> > > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <
> khurrumnasimm@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello folks,
> > > >
> > > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
> > seems
> > > > not complete yet. I am wondering if there is a plan to fully
> implement
> > > the
> > > > kafka's interface. It would be great if we can use kafka's interface
> to
> > > > access distributed log. I'd like to contribute if there is a plan.
> > > >
> > > > Thanks,
> > > > KN
> > > >
> > >
> >
>

Re: Distributed Log as Kafka's backend

Posted by Sijie Guo <si...@apache.org>.

Thanks Khurrum.

At this point, we don't have any specific process to follow for big
features. We were discussing one under
http://mail-archives.apache.org/mod_mbox/incubator-distributedlog-dev/201607.mbox/browser

But ideally, let's use mail list for discussion and use confluence page for
reflecting the discussions into a design doc.

If you already have a confluence account (if not, please create one),
please email me your account. I can grant the permission to you, then you
can edit.

- Sijie

On Mon, Aug 1, 2016 at 9:01 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> Sijie,
>
> Thank you so much for your quick reply. We are using Kafka now and we are
> interested in the features in DL like durability and handling slow
> machines.
>
> If it is okay to the community, we'd like to give a try and evaluate the
> solution. Is there any process that I should follow?
>
> KN
>
> On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
> <javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:
>
> > Khurrum,
> >
> > Interesting. Thank you for your interests in DistributedLog.
> >
> > Three years ago when we started the project internally at Twitter, we did
> > have a plan to use it as a backend for both kestrel (Twitter's in-house
> > queue system) and Kafka. However, we didn't go down that direction.
> > Instead, we built a similar self-serve pub/sub system over DistributedLog
> > to consolidate our kestrel and kafka. So we don't have a concrete plan to
> > build the kafka's interface over DistributedLog. The module was put under
> > tutorials is mostly to give people an idea how it can be used for
> building
> > a partition based pub/sub system.
> >
> > However, I don't have any strong preference here. If you think it would
> be
> > useful to other people, you are welcome to contribute. We'd be happy to
> > guide and offer any helps.
> >
> > Also, it might be good if you can explain more about what you are
> planning
> > to do. Other people in the community can chime in and discuss.
> >
> > Please let us know your thoughts. You are very welcome to make any
> > contributions.
> >
> > - Sijie
> >
> > On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <khurrumnasimm@gmail.com
> >
> > wrote:
> >
> > > Hello folks,
> > >
> > > I saw there is a 'distributedlog-kafka' module in tutorials. But it
> seems
> > > not complete yet. I am wondering if there is a plan to fully implement
> > the
> > > kafka's interface. It would be great if we can use kafka's interface to
> > > access distributed log. I'd like to contribute if there is a plan.
> > >
> > > Thanks,
> > > KN
> > >
> >
>

Distributed Log as Kafka's backend

Posted by Khurrum Nasim <kh...@gmail.com>.

Sijie,

Thank you so much for your quick reply. We are using Kafka now and we are
interested in the features in DL like durability and handling slow machines.

If it is okay to the community, we'd like to give a try and evaluate the
solution. Is there any process that I should follow?

KN

On Sunday, July 31, 2016, Sijie Guo <sijie@apache.org
<javascript:_e(%7B%7D,'cvml','sijie@apache.org');>> wrote:

> Khurrum,
>
> Interesting. Thank you for your interests in DistributedLog.
>
> Three years ago when we started the project internally at Twitter, we did
> have a plan to use it as a backend for both kestrel (Twitter's in-house
> queue system) and Kafka. However, we didn't go down that direction.
> Instead, we built a similar self-serve pub/sub system over DistributedLog
> to consolidate our kestrel and kafka. So we don't have a concrete plan to
> build the kafka's interface over DistributedLog. The module was put under
> tutorials is mostly to give people an idea how it can be used for building
> a partition based pub/sub system.
>
> However, I don't have any strong preference here. If you think it would be
> useful to other people, you are welcome to contribute. We'd be happy to
> guide and offer any helps.
>
> Also, it might be good if you can explain more about what you are planning
> to do. Other people in the community can chime in and discuss.
>
> Please let us know your thoughts. You are very welcome to make any
> contributions.
>
> - Sijie
>
> On Sat, Jul 30, 2016 at 10:33 PM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > Hello folks,
> >
> > I saw there is a 'distributedlog-kafka' module in tutorials. But it seems
> > not complete yet. I am wondering if there is a plan to fully implement
> the
> > kafka's interface. It would be great if we can use kafka's interface to
> > access distributed log. I'd like to contribute if there is a plan.
> >
> > Thanks,
> > KN
> >
>