You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Aljoscha Krettek <al...@apache.org> on 2016/08/03 02:22:18 UTC

Re: [PROPOSAL] Pipeline Runner API design doc

Hi,
thanks for putting this together. Now that I'm seeing them side by side I
think the Avro schema looks a lot nicer than the JSON schema but it's
probably alright since we don't want to change this often (as you already
said). The advantage of JSON is that the (intermediate) plans can easily be
inspected by humans.

I think at this stage there is not much left to discuss on the plan
representation. To me it seems pretty straightforward what has to be in
there and that is already more or less in. The only real thing missing are
triggers but there isn't yet a discussion about how that is going to work
out, correct?

Cheers,
Aljoscha

On Thu, 14 Jul 2016 at 21:34 Kenneth Knowles <kl...@google.com.invalid> wrote:

> Hi everyone,
>
> I wanted to circle back on this thread and with another invitation to a
> discussion. Work on the high level refactorings to align the Java SDK with
> the primitives of the proposed model is pretty far along, as is moving out
> the stuff that we don't want in the user-facing SDK.
>
> Since our runners are all Java-based, and we tend to discuss the model in
> Java first, I think part of the proposal that may have received less
> attention was the concrete Avro schema towards the bottom of the doc. Since
> our serialization tech discussion seemed to favor JSON on the front end, I
> just spent a few minutes to port the Avro schema to a JSON schema and do
> some project set up to demonstrate where & how it would incorporate into
> the project structure. I'd done the same for Avro previously, so we can see
> how they compare.
>
> I put the code in a PR, for discussion only at this point, at
> https://github.com/apache/incubator-beam/pull/662. I'd love if you took a
> look at the notes on the PR and briefly at the schema; I'll continue to
> evolve it according to current & future feedback.
>
> Kenn
>
> On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles <kl...@google.com> wrote:
>
> > Hi everyone,
> >
> > Incorporating the feedback from the 1-pager I circulated a week ago, I
> > have put together a concrete design document for the new API(s).
> >
> >
> >
> https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing
> >
> > I appreciate any and all feedback on the design.
> >
> > Kenn
> >
>

Re: [PROPOSAL] Pipeline Runner API design doc

Posted by Kenneth Knowles <kl...@google.com.INVALID>.

Hi,

Yes, there are a few things "TODO" including aggregators and triggers.
Triggers can either be an inline syntax tree or flattened and using
"pointers" like the transforms and PCollections. With coders we've hit
issues with the nesting and repetition that leads us to keep them
flattened. Essentially unreadable without un-flattening, so I would keep
things un-flattened if we weren't worried.

Kenn

On Tue, Aug 2, 2016 at 7:22 PM, Aljoscha Krettek <al...@apache.org>
wrote:

> Hi,
> thanks for putting this together. Now that I'm seeing them side by side I
> think the Avro schema looks a lot nicer than the JSON schema but it's
> probably alright since we don't want to change this often (as you already
> said). The advantage of JSON is that the (intermediate) plans can easily be
> inspected by humans.
>
> I think at this stage there is not much left to discuss on the plan
> representation. To me it seems pretty straightforward what has to be in
> there and that is already more or less in. The only real thing missing are
> triggers but there isn't yet a discussion about how that is going to work
> out, correct?
>
> Cheers,
> Aljoscha
>
> On Thu, 14 Jul 2016 at 21:34 Kenneth Knowles <kl...@google.com.invalid>
> wrote:
>
> > Hi everyone,
> >
> > I wanted to circle back on this thread and with another invitation to a
> > discussion. Work on the high level refactorings to align the Java SDK
> with
> > the primitives of the proposed model is pretty far along, as is moving
> out
> > the stuff that we don't want in the user-facing SDK.
> >
> > Since our runners are all Java-based, and we tend to discuss the model in
> > Java first, I think part of the proposal that may have received less
> > attention was the concrete Avro schema towards the bottom of the doc.
> Since
> > our serialization tech discussion seemed to favor JSON on the front end,
> I
> > just spent a few minutes to port the Avro schema to a JSON schema and do
> > some project set up to demonstrate where & how it would incorporate into
> > the project structure. I'd done the same for Avro previously, so we can
> see
> > how they compare.
> >
> > I put the code in a PR, for discussion only at this point, at
> > https://github.com/apache/incubator-beam/pull/662. I'd love if you took
> a
> > look at the notes on the PR and briefly at the schema; I'll continue to
> > evolve it according to current & future feedback.
> >
> > Kenn
> >
> > On Wed, Mar 23, 2016 at 2:17 PM, Kenneth Knowles <kl...@google.com> wrote:
> >
> > > Hi everyone,
> > >
> > > Incorporating the feedback from the 1-pager I circulated a week ago, I
> > > have put together a concrete design document for the new API(s).
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1bao-5B6uBuf-kwH1meenAuXXS0c9cBQ1B2J59I3FiyI/edit?usp=sharing
> > >
> > > I appreciate any and all feedback on the design.
> > >
> > > Kenn
> > >
> >
>