You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@distributedlog.apache.org by Sijie Guo <si...@apache.org> on 2016/09/02 04:41:23 UTC

Re: Add DistributedLog IO

Wow. This sounds interesting. Look forward to this. Let us know if you need
any helps.

- Sijie

On Wed, Aug 31, 2016 at 2:02 AM, Khurrum Nasim <kh...@gmail.com>
wrote:

> Hello beam folks,
>
> We are evaluating a new solution to unify our streaming and batching data
> pipeline, from storage, computing engine to programming model. The idea is
> basically to implement the Kappa architecture, using DistributedLog as a
> unified stream store for both streaming and batching, using Flink or Spark
> (still debating) as the process engine, and using Beam as the programming
> model.
>
> We'd like to contribute an IO connector to DistributedLog (both bounded
> source/sink and unbounded source/sink).
>
> Is there any special instructions or best practise to add a new IO
> connector? Any suggestion is very appreciated.
>
> The jira is here: https://issues.apache.org/jira/browse/BEAM-607
>
> Also, /cc the distributed log team for any helps.
>
> KN
>

Re: Add DistributedLog IO

Posted by Sijie Guo <si...@apache.org>.

Thanks Khurrum. I will try to take a look. (Although I am not an expert of
Beam.)

It would also be awesome if you can share your experience on building this
via soms blog posts, once you finished this.

Sijie

On Nov 30, 2016 4:02 AM, "Khurrum Nasim" <kh...@gmail.com> wrote:

> FYI
>
> I just sent a pull request for adding a bounded source to Beam for reading
> distributedlog streams. I am going to send out the pull request for adding
> an unbounded source and a sink after that.
>
> If you are interested in this and willing to help review it, this is the
> pull request - https://github.com/apache/incubator-beam/pull/1464
>
> - KN
>
> On Thu, Sep 1, 2016 at 9:41 PM, Sijie Guo <si...@apache.org> wrote:
>
>> Wow. This sounds interesting. Look forward to this. Let us know if you
>> need
>> any helps.
>>
>> - Sijie
>>
>> On Wed, Aug 31, 2016 at 2:02 AM, Khurrum Nasim <kh...@gmail.com>
>> wrote:
>>
>> > Hello beam folks,
>> >
>> > We are evaluating a new solution to unify our streaming and batching
>> data
>> > pipeline, from storage, computing engine to programming model. The idea
>> is
>> > basically to implement the Kappa architecture, using DistributedLog as a
>> > unified stream store for both streaming and batching, using Flink or
>> Spark
>> > (still debating) as the process engine, and using Beam as the
>> programming
>> > model.
>> >
>> > We'd like to contribute an IO connector to DistributedLog (both bounded
>> > source/sink and unbounded source/sink).
>> >
>> > Is there any special instructions or best practise to add a new IO
>> > connector? Any suggestion is very appreciated.
>> >
>> > The jira is here: https://issues.apache.org/jira/browse/BEAM-607
>> >
>> > Also, /cc the distributed log team for any helps.
>> >
>> > KN
>> >
>>
>
>

Re: Add DistributedLog IO

Posted by Sijie Guo <si...@apache.org>.

Thanks Khurrum. I will try to take a look. (Although I am not an expert of
Beam.)

It would also be awesome if you can share your experience on building this
via soms blog posts, once you finished this.

Sijie

On Nov 30, 2016 4:02 AM, "Khurrum Nasim" <kh...@gmail.com> wrote:

> FYI
>
> I just sent a pull request for adding a bounded source to Beam for reading
> distributedlog streams. I am going to send out the pull request for adding
> an unbounded source and a sink after that.
>
> If you are interested in this and willing to help review it, this is the
> pull request - https://github.com/apache/incubator-beam/pull/1464
>
> - KN
>
> On Thu, Sep 1, 2016 at 9:41 PM, Sijie Guo <si...@apache.org> wrote:
>
>> Wow. This sounds interesting. Look forward to this. Let us know if you
>> need
>> any helps.
>>
>> - Sijie
>>
>> On Wed, Aug 31, 2016 at 2:02 AM, Khurrum Nasim <kh...@gmail.com>
>> wrote:
>>
>> > Hello beam folks,
>> >
>> > We are evaluating a new solution to unify our streaming and batching
>> data
>> > pipeline, from storage, computing engine to programming model. The idea
>> is
>> > basically to implement the Kappa architecture, using DistributedLog as a
>> > unified stream store for both streaming and batching, using Flink or
>> Spark
>> > (still debating) as the process engine, and using Beam as the
>> programming
>> > model.
>> >
>> > We'd like to contribute an IO connector to DistributedLog (both bounded
>> > source/sink and unbounded source/sink).
>> >
>> > Is there any special instructions or best practise to add a new IO
>> > connector? Any suggestion is very appreciated.
>> >
>> > The jira is here: https://issues.apache.org/jira/browse/BEAM-607
>> >
>> > Also, /cc the distributed log team for any helps.
>> >
>> > KN
>> >
>>
>
>

Re: Add DistributedLog IO

Posted by Khurrum Nasim <kh...@gmail.com>.

FYI

I just sent a pull request for adding a bounded source to Beam for reading
distributedlog streams. I am going to send out the pull request for adding
an unbounded source and a sink after that.

If you are interested in this and willing to help review it, this is the
pull request - https://github.com/apache/incubator-beam/pull/1464

- KN

On Thu, Sep 1, 2016 at 9:41 PM, Sijie Guo <si...@apache.org> wrote:

> Wow. This sounds interesting. Look forward to this. Let us know if you need
> any helps.
>
> - Sijie
>
> On Wed, Aug 31, 2016 at 2:02 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > Hello beam folks,
> >
> > We are evaluating a new solution to unify our streaming and batching data
> > pipeline, from storage, computing engine to programming model. The idea
> is
> > basically to implement the Kappa architecture, using DistributedLog as a
> > unified stream store for both streaming and batching, using Flink or
> Spark
> > (still debating) as the process engine, and using Beam as the programming
> > model.
> >
> > We'd like to contribute an IO connector to DistributedLog (both bounded
> > source/sink and unbounded source/sink).
> >
> > Is there any special instructions or best practise to add a new IO
> > connector? Any suggestion is very appreciated.
> >
> > The jira is here: https://issues.apache.org/jira/browse/BEAM-607
> >
> > Also, /cc the distributed log team for any helps.
> >
> > KN
> >
>

Re: Add DistributedLog IO

Posted by Khurrum Nasim <kh...@gmail.com>.

FYI

I just sent a pull request for adding a bounded source to Beam for reading
distributedlog streams. I am going to send out the pull request for adding
an unbounded source and a sink after that.

If you are interested in this and willing to help review it, this is the
pull request - https://github.com/apache/incubator-beam/pull/1464

- KN

On Thu, Sep 1, 2016 at 9:41 PM, Sijie Guo <si...@apache.org> wrote:

> Wow. This sounds interesting. Look forward to this. Let us know if you need
> any helps.
>
> - Sijie
>
> On Wed, Aug 31, 2016 at 2:02 AM, Khurrum Nasim <kh...@gmail.com>
> wrote:
>
> > Hello beam folks,
> >
> > We are evaluating a new solution to unify our streaming and batching data
> > pipeline, from storage, computing engine to programming model. The idea
> is
> > basically to implement the Kappa architecture, using DistributedLog as a
> > unified stream store for both streaming and batching, using Flink or
> Spark
> > (still debating) as the process engine, and using Beam as the programming
> > model.
> >
> > We'd like to contribute an IO connector to DistributedLog (both bounded
> > source/sink and unbounded source/sink).
> >
> > Is there any special instructions or best practise to add a new IO
> > connector? Any suggestion is very appreciated.
> >
> > The jira is here: https://issues.apache.org/jira/browse/BEAM-607
> >
> > Also, /cc the distributed log team for any helps.
> >
> > KN
> >
>