You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@distributedlog.apache.org by Khurrum Nasim <kh...@gmail.com> on 2016/11/02 09:43:35 UTC

Use DL stream to store offsets?

As part of implementing the kafka subscriber interface, I am wondering is
there anyone use DL stream for storing the offsets?

For example, if I have N streams (0..N-1), I need to track the read offset
for each stream and store them somewhere. I can probably use other external
services (like any key/value store) to store the offset. But it would
introduce extra dependencies. I am thinking if I have a map of <stream,
offset> and periodically flush the map into a separate stream (let's say
__offset_ stream). With proper truncation/checkpoint mechanism, it would be
very fast.

This use case here is a very standard replicated state machine. I am also
wondering do you guys think of providing some common library on
distributedlog to simply implementing state machines.

- KN

Re: Use DL stream to store offsets?

Posted by Leigh Stewart <ls...@twitter.com.INVALID>.
Maybe sometime next quarter-- we're still in the process of standing up the
first production service to use this library.
No reason not to work on another implementation though. The requirements
will probably be a little different.
We'd be happy to help you understand how to approach error handling and
consistency if you do give it a shot.

On Wed, Nov 9, 2016 at 2:38 AM, Jay Juma <ja...@gmail.com> wrote:

> I am also interested in a framework/library that makes building state
> machine easier. When do you think you can share that? I'd like to
> contribute too.
>
> - Jay
>
> On Wed, Nov 2, 2016 at 9:19 AM, Leigh Stewart <lstewart@twitter.com.invalid
> >
> wrote:
>
> > Whats your timeline Khurrum? Maybe we can work something out.
> >
> > On Wed, Nov 2, 2016 at 8:15 AM, Leigh Stewart <ls...@twitter.com>
> > wrote:
> >
> > > We have in fact built something like this. No plans as yet to release,
> > but
> > > I think we would like to eventually.
> > >
> > > On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <khurrumnasimm@gmail.com
> >
> > > wrote:
> > >
> > >> As part of implementing the kafka subscriber interface, I am wondering
> > is
> > >> there anyone use DL stream for storing the offsets?
> > >>
> > >> For example, if I have N streams (0..N-1), I need to track the read
> > offset
> > >> for each stream and store them somewhere. I can probably use other
> > >> external
> > >> services (like any key/value store) to store the offset. But it would
> > >> introduce extra dependencies. I am thinking if I have a map of
> <stream,
> > >> offset> and periodically flush the map into a separate stream (let's
> say
> > >> __offset_ stream). With proper truncation/checkpoint mechanism, it
> would
> > >> be
> > >> very fast.
> > >>
> > >> This use case here is a very standard replicated state machine. I am
> > also
> > >> wondering do you guys think of providing some common library on
> > >> distributedlog to simply implementing state machines.
> > >>
> > >> - KN
> > >>
> > >
> > >
> >
>

Re: Use DL stream to store offsets?

Posted by Jay Juma <ja...@gmail.com>.
I am also interested in a framework/library that makes building state
machine easier. When do you think you can share that? I'd like to
contribute too.

- Jay

On Wed, Nov 2, 2016 at 9:19 AM, Leigh Stewart <ls...@twitter.com.invalid>
wrote:

> Whats your timeline Khurrum? Maybe we can work something out.
>
> On Wed, Nov 2, 2016 at 8:15 AM, Leigh Stewart <ls...@twitter.com>
> wrote:
>
> > We have in fact built something like this. No plans as yet to release,
> but
> > I think we would like to eventually.
> >
> > On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <kh...@gmail.com>
> > wrote:
> >
> >> As part of implementing the kafka subscriber interface, I am wondering
> is
> >> there anyone use DL stream for storing the offsets?
> >>
> >> For example, if I have N streams (0..N-1), I need to track the read
> offset
> >> for each stream and store them somewhere. I can probably use other
> >> external
> >> services (like any key/value store) to store the offset. But it would
> >> introduce extra dependencies. I am thinking if I have a map of <stream,
> >> offset> and periodically flush the map into a separate stream (let's say
> >> __offset_ stream). With proper truncation/checkpoint mechanism, it would
> >> be
> >> very fast.
> >>
> >> This use case here is a very standard replicated state machine. I am
> also
> >> wondering do you guys think of providing some common library on
> >> distributedlog to simply implementing state machines.
> >>
> >> - KN
> >>
> >
> >
>

Re: Use DL stream to store offsets?

Posted by Khurrum Nasim <kh...@gmail.com>.
That's cool, Leigh. It would be good that you can have some basic version
out as soon as possible. I'd like to leverage any existing solution, rather
than building a separate one.

- KN

On Wed, Nov 2, 2016 at 9:19 AM, Leigh Stewart <ls...@twitter.com.invalid>
wrote:

> Whats your timeline Khurrum? Maybe we can work something out.
>
> On Wed, Nov 2, 2016 at 8:15 AM, Leigh Stewart <ls...@twitter.com>
> wrote:
>
> > We have in fact built something like this. No plans as yet to release,
> but
> > I think we would like to eventually.
> >
> > On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <kh...@gmail.com>
> > wrote:
> >
> >> As part of implementing the kafka subscriber interface, I am wondering
> is
> >> there anyone use DL stream for storing the offsets?
> >>
> >> For example, if I have N streams (0..N-1), I need to track the read
> offset
> >> for each stream and store them somewhere. I can probably use other
> >> external
> >> services (like any key/value store) to store the offset. But it would
> >> introduce extra dependencies. I am thinking if I have a map of <stream,
> >> offset> and periodically flush the map into a separate stream (let's say
> >> __offset_ stream). With proper truncation/checkpoint mechanism, it would
> >> be
> >> very fast.
> >>
> >> This use case here is a very standard replicated state machine. I am
> also
> >> wondering do you guys think of providing some common library on
> >> distributedlog to simply implementing state machines.
> >>
> >> - KN
> >>
> >
> >
>

Re: Use DL stream to store offsets?

Posted by Leigh Stewart <ls...@twitter.com.INVALID>.
Whats your timeline Khurrum? Maybe we can work something out.

On Wed, Nov 2, 2016 at 8:15 AM, Leigh Stewart <ls...@twitter.com> wrote:

> We have in fact built something like this. No plans as yet to release, but
> I think we would like to eventually.
>
> On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
>> As part of implementing the kafka subscriber interface, I am wondering is
>> there anyone use DL stream for storing the offsets?
>>
>> For example, if I have N streams (0..N-1), I need to track the read offset
>> for each stream and store them somewhere. I can probably use other
>> external
>> services (like any key/value store) to store the offset. But it would
>> introduce extra dependencies. I am thinking if I have a map of <stream,
>> offset> and periodically flush the map into a separate stream (let's say
>> __offset_ stream). With proper truncation/checkpoint mechanism, it would
>> be
>> very fast.
>>
>> This use case here is a very standard replicated state machine. I am also
>> wondering do you guys think of providing some common library on
>> distributedlog to simply implementing state machines.
>>
>> - KN
>>
>
>

Re: Use DL stream to store offsets?

Posted by Leigh Stewart <ls...@twitter.com.INVALID>.
We have in fact built something like this. No plans as yet to release, but
I think we would like to eventually.

On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> As part of implementing the kafka subscriber interface, I am wondering is
> there anyone use DL stream for storing the offsets?
>
> For example, if I have N streams (0..N-1), I need to track the read offset
> for each stream and store them somewhere. I can probably use other external
> services (like any key/value store) to store the offset. But it would
> introduce extra dependencies. I am thinking if I have a map of <stream,
> offset> and periodically flush the map into a separate stream (let's say
> __offset_ stream). With proper truncation/checkpoint mechanism, it would be
> very fast.
>
> This use case here is a very standard replicated state machine. I am also
> wondering do you guys think of providing some common library on
> distributedlog to simply implementing state machines.
>
> - KN
>

Re: Use DL stream to store offsets?

Posted by Sijie Guo <si...@twitter.com.INVALID>.
On Wed, Nov 2, 2016 at 2:43 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> As part of implementing the kafka subscriber interface, I am wondering is
> there anyone use DL stream for storing the offsets?
>
> For example, if I have N streams (0..N-1), I need to track the read offset
> for each stream and store them somewhere. I can probably use other external
> services (like any key/value store) to store the offset. But it would
> introduce extra dependencies. I am thinking if I have a map of <stream,
> offset> and periodically flush the map into a separate stream (let's say
> __offset_ stream). With proper truncation/checkpoint mechanism, it would be
> very fast.
>
> This use case here is a very standard replicated state machine. I am also
> wondering do you guys think of providing some common library on
> distributedlog to simply implementing state machines.
>

Yes, one of the goals for this project is to make building state machines
easier using DL. As what Leigh said, we are working on some common
libraries for this.


>
> - KN
>