You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Eugene Kirpichov <ki...@google.com.INVALID> on 2017/05/01 05:22:49 UTC

[PROPOSAL] Running Splittable DoFn via Source API

Hey all,

TL;DR: Development of SDF ecosystem (transitioning existing connectors to
SDF, building libraries, battle-testing the API) is currently blocked on
having SDF supported at full parity with Source API in all Beam runners,
which will take a long time.

But we can unblock all this work and start doing it very soon, by running a
special case of SDF on top of the Source API.

I think this is very exciting. Please comment on the following short
proposal!

https://s.apache.org/sdf-via-source

After getting some consensus on the doc and in this thread, I will start
filing a network of JIRAs and follow up with next steps.

Re: [PROPOSAL] Running Splittable DoFn via Source API

Posted by Aljoscha Krettek <al...@apache.org>.
+1

I’m a bit hesitant, though, because stage 3 (in the new plan) could become the current stage 1: now, stage 1 is “waiting for Runners to support SDF” while stage 2 is “implement sources as SDF”. We are blocked by Runner support in stage 1 while in the new scheme we would be blocked on Runner support only in stage 3. Also, we introduce the additional burden of implementing immediate SDF implementations, i.e. in the old scheme we go source API -> SDF while in the new scheme we go source API -> “SDF but not really because Runners don’t support it” -> SDF for real.

—
Aljoscha
> On 1. May 2017, at 07:22, Eugene Kirpichov <ki...@google.com.INVALID> wrote:
> 
> Hey all,
> 
> TL;DR: Development of SDF ecosystem (transitioning existing connectors to
> SDF, building libraries, battle-testing the API) is currently blocked on
> having SDF supported at full parity with Source API in all Beam runners,
> which will take a long time.
> 
> But we can unblock all this work and start doing it very soon, by running a
> special case of SDF on top of the Source API.
> 
> I think this is very exciting. Please comment on the following short
> proposal!
> 
> https://s.apache.org/sdf-via-source
> 
> After getting some consensus on the doc and in this thread, I will start
> filing a network of JIRAs and follow up with next steps.


Re: [PROPOSAL] Running Splittable DoFn via Source API

Posted by Aljoscha Krettek <al...@apache.org>.
Yes, that additional piece of work was basically my concern. It’s a very mild concern, though, and I’m in favour of implementing SDF as a source.

Best,
Aljoscha

> On 11. May 2017, at 01:00, Eugene Kirpichov <ki...@google.com.INVALID> wrote:
> 
> Hi,
> 
> Aljoscha - can you clarify your concern?
> 
> Basically, previously the plan was:
> 1. Wait for runners to fully support SDF (hard)
> 2. Implement existing sources as SDF
> 3. Implement new APIs
> 
> and now it's:
> 0. Implement adapter (easy)
> 1. Implement existing sources as SDF
> 2. Wait for runners to have basic support of SDF (medium-hard)
> 3. Implement new APIs
> 4. As individual runners start fully supporting SDF, these sources and APIs
> perform better.
> 
> The only piece of work that exists in the new plan but not in the old plan
> is step 0 implement adapter. Other than that, new plan is a reordering of
> the old plan, where the final state is the same, but it gets delivered more
> incrementally and major pieces of it get delivered much earlier.
> 
> JB - I believe this should answer your question too?
> 
> On Tue, May 2, 2017 at 12:18 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
> 
>> +1
>> 
>> I just have a little question: are we blocked to move forward for the
>> support in
>>  the runners or it's just a question of focus ?
>> 
>> I think we could focus on this after the first stable release.
>> 
>> Thought ?
>> 
>> Regards
>> JB
>> 
>> On 05/01/2017 07:22 AM, Eugene Kirpichov wrote:
>>> Hey all,
>>> 
>>> TL;DR: Development of SDF ecosystem (transitioning existing connectors to
>>> SDF, building libraries, battle-testing the API) is currently blocked on
>>> having SDF supported at full parity with Source API in all Beam runners,
>>> which will take a long time.
>>> 
>>> But we can unblock all this work and start doing it very soon, by
>> running a
>>> special case of SDF on top of the Source API.
>>> 
>>> I think this is very exciting. Please comment on the following short
>>> proposal!
>>> 
>>> https://s.apache.org/sdf-via-source
>>> 
>>> After getting some consensus on the doc and in this thread, I will start
>>> filing a network of JIRAs and follow up with next steps.
>>> 
>> 
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>> 


Re: [PROPOSAL] Running Splittable DoFn via Source API

Posted by Eugene Kirpichov <ki...@google.com.INVALID>.
Hi,

Aljoscha - can you clarify your concern?

Basically, previously the plan was:
1. Wait for runners to fully support SDF (hard)
2. Implement existing sources as SDF
3. Implement new APIs

and now it's:
0. Implement adapter (easy)
1. Implement existing sources as SDF
2. Wait for runners to have basic support of SDF (medium-hard)
3. Implement new APIs
4. As individual runners start fully supporting SDF, these sources and APIs
perform better.

The only piece of work that exists in the new plan but not in the old plan
is step 0 implement adapter. Other than that, new plan is a reordering of
the old plan, where the final state is the same, but it gets delivered more
incrementally and major pieces of it get delivered much earlier.

JB - I believe this should answer your question too?

On Tue, May 2, 2017 at 12:18 PM Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> +1
>
> I just have a little question: are we blocked to move forward for the
> support in
>   the runners or it's just a question of focus ?
>
> I think we could focus on this after the first stable release.
>
> Thought ?
>
> Regards
> JB
>
> On 05/01/2017 07:22 AM, Eugene Kirpichov wrote:
> > Hey all,
> >
> > TL;DR: Development of SDF ecosystem (transitioning existing connectors to
> > SDF, building libraries, battle-testing the API) is currently blocked on
> > having SDF supported at full parity with Source API in all Beam runners,
> > which will take a long time.
> >
> > But we can unblock all this work and start doing it very soon, by
> running a
> > special case of SDF on top of the Source API.
> >
> > I think this is very exciting. Please comment on the following short
> > proposal!
> >
> > https://s.apache.org/sdf-via-source
> >
> > After getting some consensus on the doc and in this thread, I will start
> > filing a network of JIRAs and follow up with next steps.
> >
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>

Re: [PROPOSAL] Running Splittable DoFn via Source API

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.
+1

I just have a little question: are we blocked to move forward for the support in 
  the runners or it's just a question of focus ?

I think we could focus on this after the first stable release.

Thought ?

Regards
JB

On 05/01/2017 07:22 AM, Eugene Kirpichov wrote:
> Hey all,
>
> TL;DR: Development of SDF ecosystem (transitioning existing connectors to
> SDF, building libraries, battle-testing the API) is currently blocked on
> having SDF supported at full parity with Source API in all Beam runners,
> which will take a long time.
>
> But we can unblock all this work and start doing it very soon, by running a
> special case of SDF on top of the Source API.
>
> I think this is very exciting. Please comment on the following short
> proposal!
>
> https://s.apache.org/sdf-via-source
>
> After getting some consensus on the doc and in this thread, I will start
> filing a network of JIRAs and follow up with next steps.
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com