You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Ankur Goenka <go...@google.com> on 2018/05/12 02:56:25 UTC

Fwd: Launching a Portable Pipeline

Hi,

Recent effort on portability has introduced JobService and ArtifactService
to the beam stack along with SDK. This has open up a few questions around
how we start a pipeline in a portable setup (with JobService).
I am trying to document our approach to launching a portable pipeline and
take binding decisions based on the discussion.
Please review the document
<https://docs.google.com/document/d/1iwjmgYytHbTsG3Wbdukkra2Lf_OgOZ1zoxWu90Y10lI/edit?usp=sharing>
and
provide your feedback.

Thanks,
Ankur

Re: Launching a Portable Pipeline

Posted by Thomas Weise <th...@apache.org>.

+1

IMO that should be the approach in general. As much code as possible
reusable across runners and default job service implementation that can be
customized per runner if necessary. It will be necessary to build at least
per runner artifacts due to their dependencies (like the profiles we have
for examples/quickstart), but at least at the high level there shouldn't be
a reason why the same job service implementation cannot forward to multiple
runners.

Thanks,
Thomas




On Wed, May 23, 2018 at 3:14 PM, Reuven Lax <re...@google.com> wrote:

>
>
> On Wed, May 23, 2018 at 3:09 PM Ankur Goenka <go...@google.com> wrote:
>
>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>> be reused given that the runner specific parts are mostly in the
>> translation? or I am missing other reasons?
>>
>> Yes, absolutely. A good chunk of it can be reused. We are reusing a few
>> components from ULR in Flink runner. Calling JobService runner specific
>> gives freedom to runner to have very custom JobService if needed.
>>
>
> So you're suggesting that we should publish common JobService components
> and recommend that runners use them, but that runners are free to build
> something completely custom if they prefer?
>
>>
>> 2. What about authentication and authorisation for production runners ?
>> Once you can use such service to submit/cancel Pipelines is the first
>> thing
>> I can think of abusing.
>>
>> Authentication and authorization is still an unsolved problem. To the
>> best of my knowledge, it is runner specific and any required information
>> should be a part of grpc headers.
>>
>> On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Interesting document, two questions:
>>>
>>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>>> be reused given that the runner specific parts are mostly in the
>>> translation? or I am missing other reasons?
>>>
>>> 2. What about authentication and authorisation for production runners ?
>>> Once you can use such service to submit/cancel Pipelines is the first
>>> thing
>>> I can think of abusing.
>>> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <go...@google.com> wrote:
>>>
>>> > Thank you guys for the input.
>>>
>>> > Here is the summary.
>>>
>>> > Responsibility of Beam on Job Management
>>>
>>> > Beam provide a common interface for basic job management operations
>>> called JobService. The supported operations can vary between runners.
>>>
>>>
>>> > What is JobService?
>>>
>>> > JobService is a runner specific component which implements Beams
>>> JobService interface defined here.
>>>
>>>
>>> > What is the life cycle of a JobService?
>>>
>>> > There are 3 scenarios
>>>
>>> > With ULR, JobService is short lived and runs as long as the ULR runs. (
>>> JobService Lifespan ~= Job Lifespan )
>>>
>>> > With Production runners ( Flink, Dataflow etc), JobService can either
>>> be
>>> short lived or long lived. The choice is up to the runner.
>>>
>>> > With Production runners ( Flink, Dataflow etc) without long running
>>> JobService, SDK will spin up a local JobService.
>>>
>>>
>>> > JobService state management
>>>
>>> > The choice of state management is up to JobService implementation. The
>>> basic requirement is that JobService should be able to perform all the
>>> operations with the returned job handle.
>>>
>>> > At the very least it can be the job handle for the underlying runner
>>> job
>>> and JobService will simply proxy actions to the runner using the provided
>>> job handle.
>>>
>>> > A persistent JobService is free to provide a simple string as a
>>> JobHandle. In this case, job handle can only be used with the same job
>>> service.
>>>
>>> > A stateless not persistent JobService can provide a opaque blob
>>> containing all the relevant information about the job. In this case the
>>> job
>>> handle can be used with any instance of JobService with the same code.
>>>
>>>
>>> > JobService code distribution and invocation when JobService is short
>>> lived
>>>
>>> > We will give an easy to run solution using docker. Docker will help in
>>> both executable distribution and providing platform independent binary.
>>>
>>> > We will also give an easy setup script with a supporting document for
>>> users who do not want to use docker on local machine.
>>>
>>>
>>> > Should Flink JobService start a local cluster for testing?
>>>
>>> > Flink JobService will be capable of submitting to a remote Flink
>>> cluster
>>> if an master url is provided else it will execute the pipeline in an
>>> inprocess Flink invocation on the same JVM.
>>>
>>>
>>>
>>>
>>> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <
>>> kirpichov@google.com>
>>> wrote:
>>>
>>> >> Thanks Ankur, I think there's consensus, so it's probably ready to
>>> share
>>> :)
>>>
>>> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>> Thanks for all the input.
>>> >>> I have summarized the discussions at the bottom of the document (
>>> here
>>> ).
>>> >>> Please feel free to provide comments.
>>> >>> Once we agree, I will publish the conclusion on the mailing list.
>>>
>>> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <
>>> kirpichov@google.com>
>>> wrote:
>>>
>>> >>>> Thanks Ankur, this document clarifies a few points and raises some
>>> very important questions. I encourage everybody with a stake in
>>> Portability
>>> to take a look and chime in.
>>>
>>> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>>>
>>> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>>>> Updated link to the document as the previous link was not working
>>> for
>>> some people.
>>>
>>>
>>> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>>>>> Hi,
>>>
>>> >>>>>> Recent effort on portability has introduced JobService and
>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>> questions around how we start a pipeline in a portable setup (with
>>> JobService).
>>> >>>>>> I am trying to document our approach to launching a portable
>>> pipeline and take binding decisions based on the discussion.
>>> >>>>>> Please review the document and provide your feedback.
>>>
>>> >>>>>> Thanks,
>>> >>>>>> Ankur
>>>
>>

Re: Launching a Portable Pipeline

Posted by Ankur Goenka <go...@google.com>.

Yes, JobService can be implemented by a runner and can be bade available
using an endpoint.
The component reuse is more of a code reuse.

On Wed, May 23, 2018 at 3:14 PM Reuven Lax <re...@google.com> wrote:

>
>
> On Wed, May 23, 2018 at 3:09 PM Ankur Goenka <go...@google.com> wrote:
>
>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>> be reused given that the runner specific parts are mostly in the
>> translation? or I am missing other reasons?
>>
>> Yes, absolutely. A good chunk of it can be reused. We are reusing a few
>> components from ULR in Flink runner. Calling JobService runner specific
>> gives freedom to runner to have very custom JobService if needed.
>>
>
> So you're suggesting that we should publish common JobService components
> and recommend that runners use them, but that runners are free to build
> something completely custom if they prefer?
>
>>
>> 2. What about authentication and authorisation for production runners ?
>> Once you can use such service to submit/cancel Pipelines is the first
>> thing
>> I can think of abusing.
>>
>> Authentication and authorization is still an unsolved problem. To the
>> best of my knowledge, it is runner specific and any required information
>> should be a part of grpc headers.
>>
>> On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> Interesting document, two questions:
>>>
>>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>>> be reused given that the runner specific parts are mostly in the
>>> translation? or I am missing other reasons?
>>>
>>> 2. What about authentication and authorisation for production runners ?
>>> Once you can use such service to submit/cancel Pipelines is the first
>>> thing
>>> I can think of abusing.
>>> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <go...@google.com> wrote:
>>>
>>> > Thank you guys for the input.
>>>
>>> > Here is the summary.
>>>
>>> > Responsibility of Beam on Job Management
>>>
>>> > Beam provide a common interface for basic job management operations
>>> called JobService. The supported operations can vary between runners.
>>>
>>>
>>> > What is JobService?
>>>
>>> > JobService is a runner specific component which implements Beams
>>> JobService interface defined here.
>>>
>>>
>>> > What is the life cycle of a JobService?
>>>
>>> > There are 3 scenarios
>>>
>>> > With ULR, JobService is short lived and runs as long as the ULR runs. (
>>> JobService Lifespan ~= Job Lifespan )
>>>
>>> > With Production runners ( Flink, Dataflow etc), JobService can either
>>> be
>>> short lived or long lived. The choice is up to the runner.
>>>
>>> > With Production runners ( Flink, Dataflow etc) without long running
>>> JobService, SDK will spin up a local JobService.
>>>
>>>
>>> > JobService state management
>>>
>>> > The choice of state management is up to JobService implementation. The
>>> basic requirement is that JobService should be able to perform all the
>>> operations with the returned job handle.
>>>
>>> > At the very least it can be the job handle for the underlying runner
>>> job
>>> and JobService will simply proxy actions to the runner using the provided
>>> job handle.
>>>
>>> > A persistent JobService is free to provide a simple string as a
>>> JobHandle. In this case, job handle can only be used with the same job
>>> service.
>>>
>>> > A stateless not persistent JobService can provide a opaque blob
>>> containing all the relevant information about the job. In this case the
>>> job
>>> handle can be used with any instance of JobService with the same code.
>>>
>>>
>>> > JobService code distribution and invocation when JobService is short
>>> lived
>>>
>>> > We will give an easy to run solution using docker. Docker will help in
>>> both executable distribution and providing platform independent binary.
>>>
>>> > We will also give an easy setup script with a supporting document for
>>> users who do not want to use docker on local machine.
>>>
>>>
>>> > Should Flink JobService start a local cluster for testing?
>>>
>>> > Flink JobService will be capable of submitting to a remote Flink
>>> cluster
>>> if an master url is provided else it will execute the pipeline in an
>>> inprocess Flink invocation on the same JVM.
>>>
>>>
>>>
>>>
>>> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <
>>> kirpichov@google.com>
>>> wrote:
>>>
>>> >> Thanks Ankur, I think there's consensus, so it's probably ready to
>>> share
>>> :)
>>>
>>> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>> Thanks for all the input.
>>> >>> I have summarized the discussions at the bottom of the document (
>>> here
>>> ).
>>> >>> Please feel free to provide comments.
>>> >>> Once we agree, I will publish the conclusion on the mailing list.
>>>
>>> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <
>>> kirpichov@google.com>
>>> wrote:
>>>
>>> >>>> Thanks Ankur, this document clarifies a few points and raises some
>>> very important questions. I encourage everybody with a stake in
>>> Portability
>>> to take a look and chime in.
>>>
>>> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>>>
>>> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>>>> Updated link to the document as the previous link was not working
>>> for
>>> some people.
>>>
>>>
>>> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com>
>>> wrote:
>>>
>>> >>>>>> Hi,
>>>
>>> >>>>>> Recent effort on portability has introduced JobService and
>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>> questions around how we start a pipeline in a portable setup (with
>>> JobService).
>>> >>>>>> I am trying to document our approach to launching a portable
>>> pipeline and take binding decisions based on the discussion.
>>> >>>>>> Please review the document and provide your feedback.
>>>
>>> >>>>>> Thanks,
>>> >>>>>> Ankur
>>>
>>

Re: Launching a Portable Pipeline

Posted by Reuven Lax <re...@google.com>.

On Wed, May 23, 2018 at 3:09 PM Ankur Goenka <go...@google.com> wrote:

> 1. Why JobService is runner specific? Couldn't at least a good part of it
> be reused given that the runner specific parts are mostly in the
> translation? or I am missing other reasons?
>
> Yes, absolutely. A good chunk of it can be reused. We are reusing a few
> components from ULR in Flink runner. Calling JobService runner specific
> gives freedom to runner to have very custom JobService if needed.
>

So you're suggesting that we should publish common JobService components
and recommend that runners use them, but that runners are free to build
something completely custom if they prefer?

>
> 2. What about authentication and authorisation for production runners ?
> Once you can use such service to submit/cancel Pipelines is the first thing
> I can think of abusing.
>
> Authentication and authorization is still an unsolved problem. To the best
> of my knowledge, it is runner specific and any required information should
> be a part of grpc headers.
>
> On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Interesting document, two questions:
>>
>> 1. Why JobService is runner specific? Couldn't at least a good part of it
>> be reused given that the runner specific parts are mostly in the
>> translation? or I am missing other reasons?
>>
>> 2. What about authentication and authorisation for production runners ?
>> Once you can use such service to submit/cancel Pipelines is the first
>> thing
>> I can think of abusing.
>> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <go...@google.com> wrote:
>>
>> > Thank you guys for the input.
>>
>> > Here is the summary.
>>
>> > Responsibility of Beam on Job Management
>>
>> > Beam provide a common interface for basic job management operations
>> called JobService. The supported operations can vary between runners.
>>
>>
>> > What is JobService?
>>
>> > JobService is a runner specific component which implements Beams
>> JobService interface defined here.
>>
>>
>> > What is the life cycle of a JobService?
>>
>> > There are 3 scenarios
>>
>> > With ULR, JobService is short lived and runs as long as the ULR runs. (
>> JobService Lifespan ~= Job Lifespan )
>>
>> > With Production runners ( Flink, Dataflow etc), JobService can either be
>> short lived or long lived. The choice is up to the runner.
>>
>> > With Production runners ( Flink, Dataflow etc) without long running
>> JobService, SDK will spin up a local JobService.
>>
>>
>> > JobService state management
>>
>> > The choice of state management is up to JobService implementation. The
>> basic requirement is that JobService should be able to perform all the
>> operations with the returned job handle.
>>
>> > At the very least it can be the job handle for the underlying runner job
>> and JobService will simply proxy actions to the runner using the provided
>> job handle.
>>
>> > A persistent JobService is free to provide a simple string as a
>> JobHandle. In this case, job handle can only be used with the same job
>> service.
>>
>> > A stateless not persistent JobService can provide a opaque blob
>> containing all the relevant information about the job. In this case the
>> job
>> handle can be used with any instance of JobService with the same code.
>>
>>
>> > JobService code distribution and invocation when JobService is short
>> lived
>>
>> > We will give an easy to run solution using docker. Docker will help in
>> both executable distribution and providing platform independent binary.
>>
>> > We will also give an easy setup script with a supporting document for
>> users who do not want to use docker on local machine.
>>
>>
>> > Should Flink JobService start a local cluster for testing?
>>
>> > Flink JobService will be capable of submitting to a remote Flink cluster
>> if an master url is provided else it will execute the pipeline in an
>> inprocess Flink invocation on the same JVM.
>>
>>
>>
>>
>> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <kirpichov@google.com
>> >
>> wrote:
>>
>> >> Thanks Ankur, I think there's consensus, so it's probably ready to
>> share
>> :)
>>
>> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com>
>> wrote:
>>
>> >>> Thanks for all the input.
>> >>> I have summarized the discussions at the bottom of the document ( here
>> ).
>> >>> Please feel free to provide comments.
>> >>> Once we agree, I will publish the conclusion on the mailing list.
>>
>> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <
>> kirpichov@google.com>
>> wrote:
>>
>> >>>> Thanks Ankur, this document clarifies a few points and raises some
>> very important questions. I encourage everybody with a stake in
>> Portability
>> to take a look and chime in.
>>
>> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>>
>> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com>
>> wrote:
>>
>> >>>>> Updated link to the document as the previous link was not working
>> for
>> some people.
>>
>>
>> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com>
>> wrote:
>>
>> >>>>>> Hi,
>>
>> >>>>>> Recent effort on portability has introduced JobService and
>> ArtifactService to the beam stack along with SDK. This has open up a few
>> questions around how we start a pipeline in a portable setup (with
>> JobService).
>> >>>>>> I am trying to document our approach to launching a portable
>> pipeline and take binding decisions based on the discussion.
>> >>>>>> Please review the document and provide your feedback.
>>
>> >>>>>> Thanks,
>> >>>>>> Ankur
>>
>

Re: Launching a Portable Pipeline

Posted by Ankur Goenka <go...@google.com>.

1. Why JobService is runner specific? Couldn't at least a good part of it
be reused given that the runner specific parts are mostly in the
translation? or I am missing other reasons?

Yes, absolutely. A good chunk of it can be reused. We are reusing a few
components from ULR in Flink runner. Calling JobService runner specific
gives freedom to runner to have very custom JobService if needed.

2. What about authentication and authorisation for production runners ?
Once you can use such service to submit/cancel Pipelines is the first thing
I can think of abusing.

Authentication and authorization is still an unsolved problem. To the best
of my knowledge, it is runner specific and any required information should
be a part of grpc headers.

On Wed, May 23, 2018 at 2:48 PM Ismaël Mejía <ie...@gmail.com> wrote:

> Interesting document, two questions:
>
> 1. Why JobService is runner specific? Couldn't at least a good part of it
> be reused given that the runner specific parts are mostly in the
> translation? or I am missing other reasons?
>
> 2. What about authentication and authorisation for production runners ?
> Once you can use such service to submit/cancel Pipelines is the first thing
> I can think of abusing.
> On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <go...@google.com> wrote:
>
> > Thank you guys for the input.
>
> > Here is the summary.
>
> > Responsibility of Beam on Job Management
>
> > Beam provide a common interface for basic job management operations
> called JobService. The supported operations can vary between runners.
>
>
> > What is JobService?
>
> > JobService is a runner specific component which implements Beams
> JobService interface defined here.
>
>
> > What is the life cycle of a JobService?
>
> > There are 3 scenarios
>
> > With ULR, JobService is short lived and runs as long as the ULR runs. (
> JobService Lifespan ~= Job Lifespan )
>
> > With Production runners ( Flink, Dataflow etc), JobService can either be
> short lived or long lived. The choice is up to the runner.
>
> > With Production runners ( Flink, Dataflow etc) without long running
> JobService, SDK will spin up a local JobService.
>
>
> > JobService state management
>
> > The choice of state management is up to JobService implementation. The
> basic requirement is that JobService should be able to perform all the
> operations with the returned job handle.
>
> > At the very least it can be the job handle for the underlying runner job
> and JobService will simply proxy actions to the runner using the provided
> job handle.
>
> > A persistent JobService is free to provide a simple string as a
> JobHandle. In this case, job handle can only be used with the same job
> service.
>
> > A stateless not persistent JobService can provide a opaque blob
> containing all the relevant information about the job. In this case the job
> handle can be used with any instance of JobService with the same code.
>
>
> > JobService code distribution and invocation when JobService is short
> lived
>
> > We will give an easy to run solution using docker. Docker will help in
> both executable distribution and providing platform independent binary.
>
> > We will also give an easy setup script with a supporting document for
> users who do not want to use docker on local machine.
>
>
> > Should Flink JobService start a local cluster for testing?
>
> > Flink JobService will be capable of submitting to a remote Flink cluster
> if an master url is provided else it will execute the pipeline in an
> inprocess Flink invocation on the same JVM.
>
>
>
>
> > On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
> >> Thanks Ankur, I think there's consensus, so it's probably ready to share
> :)
>
> >> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com> wrote:
>
> >>> Thanks for all the input.
> >>> I have summarized the discussions at the bottom of the document ( here
> ).
> >>> Please feel free to provide comments.
> >>> Once we agree, I will publish the conclusion on the mailing list.
>
> >>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <kirpichov@google.com
> >
> wrote:
>
> >>>> Thanks Ankur, this document clarifies a few points and raises some
> very important questions. I encourage everybody with a stake in Portability
> to take a look and chime in.
>
> >>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde
>
> >>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com>
> wrote:
>
> >>>>> Updated link to the document as the previous link was not working for
> some people.
>
>
> >>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com>
> wrote:
>
> >>>>>> Hi,
>
> >>>>>> Recent effort on portability has introduced JobService and
> ArtifactService to the beam stack along with SDK. This has open up a few
> questions around how we start a pipeline in a portable setup (with
> JobService).
> >>>>>> I am trying to document our approach to launching a portable
> pipeline and take binding decisions based on the discussion.
> >>>>>> Please review the document and provide your feedback.
>
> >>>>>> Thanks,
> >>>>>> Ankur
>

Re: Launching a Portable Pipeline

Posted by Ismaël Mejía <ie...@gmail.com>.

Interesting document, two questions:

1. Why JobService is runner specific? Couldn't at least a good part of it
be reused given that the runner specific parts are mostly in the
translation? or I am missing other reasons?

2. What about authentication and authorisation for production runners ?
Once you can use such service to submit/cancel Pipelines is the first thing
I can think of abusing.
On Tue, May 22, 2018 at 9:40 PM Ankur Goenka <go...@google.com> wrote:

> Thank you guys for the input.

> Here is the summary.

> Responsibility of Beam on Job Management

> Beam provide a common interface for basic job management operations
called JobService. The supported operations can vary between runners.


> What is JobService?

> JobService is a runner specific component which implements Beams
JobService interface defined here.


> What is the life cycle of a JobService?

> There are 3 scenarios

> With ULR, JobService is short lived and runs as long as the ULR runs. (
JobService Lifespan ~= Job Lifespan )

> With Production runners ( Flink, Dataflow etc), JobService can either be
short lived or long lived. The choice is up to the runner.

> With Production runners ( Flink, Dataflow etc) without long running
JobService, SDK will spin up a local JobService.


> JobService state management

> The choice of state management is up to JobService implementation. The
basic requirement is that JobService should be able to perform all the
operations with the returned job handle.

> At the very least it can be the job handle for the underlying runner job
and JobService will simply proxy actions to the runner using the provided
job handle.

> A persistent JobService is free to provide a simple string as a
JobHandle. In this case, job handle can only be used with the same job
service.

> A stateless not persistent JobService can provide a opaque blob
containing all the relevant information about the job. In this case the job
handle can be used with any instance of JobService with the same code.


> JobService code distribution and invocation when JobService is short lived

> We will give an easy to run solution using docker. Docker will help in
both executable distribution and providing platform independent binary.

> We will also give an easy setup script with a supporting document for
users who do not want to use docker on local machine.


> Should Flink JobService start a local cluster for testing?

> Flink JobService will be capable of submitting to a remote Flink cluster
if an master url is provided else it will execute the pipeline in an
inprocess Flink invocation on the same JVM.




> On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <ki...@google.com>
wrote:

>> Thanks Ankur, I think there's consensus, so it's probably ready to share
:)

>> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com> wrote:

>>> Thanks for all the input.
>>> I have summarized the discussions at the bottom of the document ( here
).
>>> Please feel free to provide comments.
>>> Once we agree, I will publish the conclusion on the mailing list.

>>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <ki...@google.com>
wrote:

>>>> Thanks Ankur, this document clarifies a few points and raises some
very important questions. I encourage everybody with a stake in Portability
to take a look and chime in.

>>>> +Aljoscha Krettek +Thomas Weise +Henning Rohde

>>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com>
wrote:

>>>>> Updated link to the document as the previous link was not working for
some people.


>>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com>
wrote:

>>>>>> Hi,

>>>>>> Recent effort on portability has introduced JobService and
ArtifactService to the beam stack along with SDK. This has open up a few
questions around how we start a pipeline in a portable setup (with
JobService).
>>>>>> I am trying to document our approach to launching a portable
pipeline and take binding decisions based on the discussion.
>>>>>> Please review the document and provide your feedback.

>>>>>> Thanks,
>>>>>> Ankur

Re: Launching a Portable Pipeline

Posted by Ankur Goenka <go...@google.com>.

Thank you guys for the input.

Here
<https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit#heading=h.lky5ef6wxo9x>
 is the summary.

*Responsibility of Beam on Job ManagementBeam provide a common interface
for basic job management operations called JobService. The supported
operations can vary between runners.What is JobService?JobService is a
runner specific component which implements Beams JobService interface
defined here
<https://github.com/apache/beam/blob/master/model/job-management/src/main/proto/beam_job_api.proto>.What
is the life cycle of a JobService?There are 3 scenarios 1. With ULR,
JobService is short lived and runs as long as the ULR runs. ( JobService
Lifespan ~= Job Lifespan )2. With Production runners ( Flink, Dataflow
etc), JobService can either be short lived or long lived. The choice is up
to the runner.3. With Production runners ( Flink, Dataflow etc) without
long running JobService, SDK will spin up a local JobService.JobService
state managementThe choice of state management is up to JobService
implementation. The basic requirement is that JobService should be able to
perform all the operations with the returned job handle. At the very least
it can be the job handle for the underlying runner job and JobService will
simply proxy actions to the runner using the provided job handle.A
persistent JobService is free to provide a simple string as a JobHandle. In
this case, job handle can only be used with the same job service.A
stateless not persistent JobService can provide a opaque blob containing
all the relevant information about the job. In this case the job handle can
be used with any instance of JobService with the same code.JobService code
distribution and invocation when JobService is short livedWe will give an
easy to run solution using docker. Docker will help in both executable
distribution and providing platform independent binary.We will also give an
easy setup script with a supporting document for users who do not want to
use docker on local machine.Should Flink JobService start a local cluster
for testing?Flink JobService will be capable of submitting to a remote
Flink cluster if an master url is provided else it will execute the
pipeline in an inprocess Flink invocation on the same JVM.*

On Tue, May 22, 2018 at 12:37 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Thanks Ankur, I think there's consensus, so it's probably ready to share :)
>
> On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com> wrote:
>
>> Thanks for all the input.
>> I have summarized the discussions at the bottom of the document ( here
>> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit#heading=h.lky5ef6wxo9x>
>> ).
>> Please feel free to provide comments.
>> Once we agree, I will publish the conclusion on the mailing list.
>>
>> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <ki...@google.com>
>> wrote:
>>
>>> Thanks Ankur, this document clarifies a few points and raises some very
>>> important questions. I encourage everybody with a stake in Portability to
>>> take a look and chime in.
>>>
>>> +Aljoscha Krettek <al...@data-artisans.com> +Thomas Weise
>>> <th...@apache.org> +Henning Rohde <he...@google.com>
>>>
>>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com> wrote:
>>>
>>>> Updated link
>>>> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit> to
>>>> the document as the previous link was not working for some people.
>>>>
>>>>
>>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Recent effort on portability has introduced JobService and
>>>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>>>> questions around how we start a pipeline in a portable setup (with
>>>>> JobService).
>>>>> I am trying to document our approach to launching a portable pipeline
>>>>> and take binding decisions based on the discussion.
>>>>> Please review the document and provide your feedback.
>>>>>
>>>>> Thanks,
>>>>> Ankur
>>>>>
>>>>

Re: Launching a Portable Pipeline

Posted by Eugene Kirpichov <ki...@google.com>.

Thanks Ankur, I think there's consensus, so it's probably ready to share :)

On Fri, May 18, 2018 at 3:00 PM Ankur Goenka <go...@google.com> wrote:

> Thanks for all the input.
> I have summarized the discussions at the bottom of the document ( here
> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit#heading=h.lky5ef6wxo9x>
> ).
> Please feel free to provide comments.
> Once we agree, I will publish the conclusion on the mailing list.
>
> On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <ki...@google.com>
> wrote:
>
>> Thanks Ankur, this document clarifies a few points and raises some very
>> important questions. I encourage everybody with a stake in Portability to
>> take a look and chime in.
>>
>> +Aljoscha Krettek <al...@data-artisans.com> +Thomas Weise
>> <th...@apache.org> +Henning Rohde <he...@google.com>
>>
>> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com> wrote:
>>
>>> Updated link
>>> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit> to
>>> the document as the previous link was not working for some people.
>>>
>>>
>>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Recent effort on portability has introduced JobService and
>>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>>> questions around how we start a pipeline in a portable setup (with
>>>> JobService).
>>>> I am trying to document our approach to launching a portable pipeline
>>>> and take binding decisions based on the discussion.
>>>> Please review the document and provide your feedback.
>>>>
>>>> Thanks,
>>>> Ankur
>>>>
>>>

Re: Launching a Portable Pipeline

Posted by Ankur Goenka <go...@google.com>.

Thanks for all the input.
I have summarized the discussions at the bottom of the document ( here
<https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit#heading=h.lky5ef6wxo9x>
).
Please feel free to provide comments.
Once we agree, I will publish the conclusion on the mailing list.

On Mon, May 14, 2018 at 1:51 PM Eugene Kirpichov <ki...@google.com>
wrote:

> Thanks Ankur, this document clarifies a few points and raises some very
> important questions. I encourage everybody with a stake in Portability to
> take a look and chime in.
>
> +Aljoscha Krettek <al...@data-artisans.com> +Thomas Weise
> <th...@apache.org> +Henning Rohde <he...@google.com>
>
> On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com> wrote:
>
>> Updated link
>> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit> to
>> the document as the previous link was not working for some people.
>>
>>
>> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com> wrote:
>>
>>> Hi,
>>>
>>> Recent effort on portability has introduced JobService and
>>> ArtifactService to the beam stack along with SDK. This has open up a few
>>> questions around how we start a pipeline in a portable setup (with
>>> JobService).
>>> I am trying to document our approach to launching a portable pipeline
>>> and take binding decisions based on the discussion.
>>> Please review the document and provide your feedback.
>>>
>>> Thanks,
>>> Ankur
>>>
>>

Re: Launching a Portable Pipeline

Posted by Eugene Kirpichov <ki...@google.com>.

Thanks Ankur, this document clarifies a few points and raises some very
important questions. I encourage everybody with a stake in Portability to
take a look and chime in.

+Aljoscha Krettek <al...@data-artisans.com> +Thomas Weise
<th...@apache.org> +Henning Rohde <he...@google.com>

On Mon, May 14, 2018 at 12:34 PM Ankur Goenka <go...@google.com> wrote:

> Updated link
> <https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit> to
> the document as the previous link was not working for some people.
>
>
> On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com> wrote:
>
>> Hi,
>>
>> Recent effort on portability has introduced JobService and
>> ArtifactService to the beam stack along with SDK. This has open up a few
>> questions around how we start a pipeline in a portable setup (with
>> JobService).
>> I am trying to document our approach to launching a portable pipeline and
>> take binding decisions based on the discussion.
>> Please review the document
>> <https://docs.google.com/document/d/1iwjmgYytHbTsG3Wbdukkra2Lf_OgOZ1zoxWu90Y10lI/edit?usp=sharing> and
>> provide your feedback.
>>
>> Thanks,
>> Ankur
>>
>

Re: Launching a Portable Pipeline

Posted by Ankur Goenka <go...@google.com>.

Updated link
<https://docs.google.com/document/d/1xOaEEJrMmiSHprd-WiYABegfT129qqF-idUBINjxz8s/edit>
to
the document as the previous link was not working for some people.

On Fri, May 11, 2018 at 7:56 PM Ankur Goenka <go...@google.com> wrote:

> Hi,
>
> Recent effort on portability has introduced JobService and ArtifactService
> to the beam stack along with SDK. This has open up a few questions around
> how we start a pipeline in a portable setup (with JobService).
> I am trying to document our approach to launching a portable pipeline and
> take binding decisions based on the discussion.
> Please review the document
> <https://docs.google.com/document/d/1iwjmgYytHbTsG3Wbdukkra2Lf_OgOZ1zoxWu90Y10lI/edit?usp=sharing> and
> provide your feedback.
>
> Thanks,
> Ankur
>