You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@beam.apache.org by Maximilian Michels <mx...@apache.org> on 2018/08/20 15:31:46 UTC

Bootstrapping Beam's Job Server

Hi everyone,

I wanted to get your opinion on the Job-Server startup [1] which is part
of the portability story.

I've created a docker container to bring up Beam's Job Server, which is
the entry point for pipeline execution. Generally, this works fine when
the backend (Flink in this case) runs externally and the Job Server
connects to it.

For tests or pipeline development we may want the backend to run
embedded (inside the Job Server) which is rather problematic because the
portability requires to spin up the SDK harness in a Docker container as
well. This would happen at runtime inside the Docker container.

Since Docker inside Docker is not desirable I'm thinking about other
options:

Option 1) Instead of a Docker container, we start a bundled Job-Server
binary (or jar) when we run the pipeline. The bundle also contains an
embedded variant of the backend. For Flink, this is basically the output
of `:beam-runners-flink_2.11-job-server:shadowJar` but it is started
during pipeline execution.

Option 2) In addition to the Job Server, we let the SDK spin up another
Docker container with the backend. This is may be most applicable to all
types of backends since not all backends offer an embedded execution mode.


Keep in mind that this is only a problem for local/test execution but it
is an important aspect of Beam's usability.

What do you think? I'm leaning towards option 2. Maybe you have other
options in mind.

Cheers,
Max

[1] https://issues.apache.org/jira/browse/BEAM-4130

Re: Bootstrapping Beam's Job Server

Posted by Henning Rohde <he...@google.com>.

>> Option 3) would be to map in the docker binary and socket to allow
>> the containerized Flink job server to start "sibling" containers on
>> the host.
>
>Do you mean packaging Docker inside the Job Server container and
>mounting /var/run/docker.sock from the host inside the container? That
>looks like a bit of a hack but for testing it could be fine.

Basically, yes, although I would also map in the docker binary itself to
ensure compatibility with the host. It's not something I would suggest for
production use -- just for local jobs. It would allow the Go SDK to just
work OOB, for example.

The process-based scenario can be a configuration feature of each
SDK/runner. I see that as a useful complement to dockerized SDKs, although
the onus is then on the runner/user to ensure the environment is adequate
for the SDK(s) used in the job. The main appeal of docker is precisely to
not have that requirement, but for some deployments it is reasonable.

Henning


On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mx...@apache.org> wrote:

> Thanks for your suggestions. Please see below.
>
> > Option 3) would be to map in the docker binary and socket to allow
> > the containerized Flink job server to start "sibling" containers on
> > the host.
>
> Do you mean packaging Docker inside the Job Server container and
> mounting /var/run/docker.sock from the host inside the container? That
> looks like a bit of a hack but for testing it could be fine.
>
> > notably, if the runner supports auto-scaling or similar non-trivial
> > configurations, that would be difficult to manage from the SDK side.
>
> You're right, it would be unfortunate if the SDK would have to deal with
> spinning up SDK harness/backend containers. For non-trivial
> configurations it would probably require an extended protocol.
>
> > Option 4) We are also thinking about adding process based SDKHarness.
> > This will avoid docker in docker scenario.
>
> Actually, I had started implementing a process-based SDK harness but
> figured it might be impractical because it doubles the execution path
> for UDF code and potentially doesn't work with custom dependencies.
>
> > Process based SDKHarness also has other applications and might be
> > desirable in some of the production use cases.
>
> True. Some users might want something more lightweight.
>

Re: Bootstrapping Beam's Job Server

Posted by Thomas Weise <th...@apache.org>.

On Thu, Aug 23, 2018 at 6:47 AM Maximilian Michels <mx...@apache.org> wrote:

>  > Going down this path may start to get fairly involved, with an almost
>  > endless list of features that could be requested. Instead, I would
>  > suggest we keep process-based execution very simple, and specify bash
>  > script (that sets up the environment and whatever else one may want to
>  > do) as the command line invocation.
>
> Fair point. At the least, we will have to transfer the shell script to
> the nodes. Anything else is up to the script.
>

That would be another artifact. But this can also be part of host
provisioning (i.e. this worker execution model does not perform artifact
staging).


 > I would also think it'd be really valuable to provide a "callback"
>  > environment, where an RPC call is made to trigger worker creation
>  > (deletion?), passing the requisite parameters (e.g. the fn api
>  > endpoints).
>
> Aren't you making up more features now? :) Couldn't this be also handled
> by the shell script?
>
> On 23.08.18 14:13, Robert Bradshaw wrote:
> > On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org>
> wrote:
> >>
> >> Big +1. Process-based execution should be simple to reason about for
> >> users.
> >
> > +1. In fact, this is exactly what the Python local job server does,
> > with running Docker simply being a particular command line that's
> > passed down here.
> >
> >
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
> >
> >> The implementation should not be too involved. The user has to
> >> ensure the environment is suitable for process-based execution.
> >>
> >> There are some minor features that we should support:
> >>
> >> - Activating a virtual environment for Python / Adding pre-installed
> >> libraries to the classpath
> >>
> >> - Staging libraries, similarly to the boot code for Docker
> >
> > Going down this path may start to get fairly involved, with an almost
> > endless list of features that could be requested. Instead, I would
> > suggest we keep process-based execution very simple, and specify bash
> > script (that sets up the environment and whatever else one may want to
> > do) as the command line invocation. We could even provide a couple of
> > these. (The arguments to pass should be configurable).
> >
> > I would also think it'd be really valuable to provide a "callback"
> > environment, where an RPC call is made to trigger worker creation
> > (deletion?), passing the requisite parameters (e.g. the fn api
> > endpoints). This could be useful both in a distributed system (where
> > it may be desirable for an external entity to actually start up the
> > workers) or for debugging/testing (where one could call into the same
> > process that submitted the job, which would execute workers on
> > separate threads with an already set up environment).
> >
> >> On 22.08.18 07:49, Henning Rohde wrote:
> >>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
> >>> such as custom command line (defined in the environment proto) rooted
> at
> >>> the base of the provided artifacts and either passed the same arguments
> >>> or defined in the container contract or made available through
> >>> substitution. That way, all the restrictions/assumptions of the
> >>> execution environment become implicit and runner/deployment dependent.
> >>>
> >>>
> >>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
> >>> <ma...@google.com>> wrote:
> >>>
> >>>      I believe supporting a simple Process environment makes sense. It
> >>>      would be best if we didn't make the Process route solve all the
> >>>      problems that Docker solves for us. In my opinion we should limit
> >>>      the Process route to assume that the execution environment:
> >>>      * has all dependencies and libraries installed
> >>>      * is of a compatible machine architecture
> >>>      * doesn't require special networking rules to be setup
> >>>
> >>>      Any other suggestions for reasonable limits on a Process
> environment?
> >>>
> >>>      On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
> >>>      <ma...@gmail.com>> wrote:
> >>>
> >>>          It is also worth to mention that apart of the
> >>>          testing/development use
> >>>          case there is also the case of supporting people running in
> Hadoop
> >>>          distributions. There are two extra reasons to want a process
> based
> >>>          version: (1) Some Hadoop distributions run in machines with
> >>>          really old
> >>>          kernels where docker support is limited or nonexistent (yes,
> some of
> >>>          those run on kernel 2.6!) and (2) Ops people may be reticent
> to the
> >>>          additional operational overhead of enabling docker in their
> >>>          clusters.
> >>>          On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
> >>>          <mxm@apache.org <ma...@apache.org>> wrote:
> >>>           >
> >>>           > Thanks Henning and Thomas. It looks like
> >>>           >
> >>>           > a) we want to keep the Docker Job Server Docker container
> and
> >>>          rely on
> >>>           > spinning up "sibling" SDK harness containers via the Docker
> >>>          socket. This
> >>>           > should require little changes to the Runner code.
> >>>           >
> >>>           > b) have the InProcess SDK harness as an alternative way to
> >>>          running user
> >>>           > code. This can be done independently of a).
> >>>           >
> >>>           > Thomas, let's sync today on the InProcess SDK harness. I've
> >>>          created a
> >>>           > JIRA issue:
> https://issues.apache.org/jira/browse/BEAM-5187
> >>>           >
> >>>           > Cheers,
> >>>           > Max
> >>>           >
> >>>           > On 21.08.18 00:35, Thomas Weise wrote:
> >>>           > > The original objective was to make test/development
> easier
> >>>          (which I
> >>>           > > think is super important for user experience with
> portable
> >>>          runner).
> >>>           > >
> >>>           > >  From first hand experience I can confirm that dealing
> with
> >>>          Flink
> >>>           > > clusters and Docker containers for local setup is a
> >>>          significant hurdle
> >>>           > > for Python developers.
> >>>           > >
> >>>           > > To simplify using Flink in embedded mode, the (direct)
> >>>          process based SDK
> >>>           > > harness would be a good option, especially when it can be
> >>>          linked to the
> >>>           > > same virtualenv that developers have already setup,
> >>>          eliminating extra
> >>>           > > packaging/deployment steps.
> >>>           > >
> >>>           > > Max, I would be interested to sync up on what your
> thoughts are
> >>>           > > regarding that option since you mention you also started
> to
> >>>          work on it
> >>>           > > (see previous discussion [1], not sure if there is a JIRA
> >>>          for it yet).
> >>>           > > Internally we are planning to use a direct SDK harness
> >>>          process instead
> >>>           > > of Docker containers. For our specific needs it will
> works
> >>>          equally well
> >>>           > > for development and production, including future plans to
> >>>          deploy Flink
> >>>           > > TMs via Kubernetes.
> >>>           > >
> >>>           > > Thanks,
> >>>           > > Thomas
> >>>           > >
> >>>           > > [1]
> >>>           > >
> >>>
> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
> >>>          <mxm@apache.org <ma...@apache.org>
> >>>           > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
> >>>           > >
> >>>           > >     Thanks for your suggestions. Please see below.
> >>>           > >
> >>>           > >      > Option 3) would be to map in the docker binary and
> >>>          socket to allow
> >>>           > >      > the containerized Flink job server to start
> >>>          "sibling" containers on
> >>>           > >      > the host.
> >>>           > >
> >>>           > >     Do you mean packaging Docker inside the Job Server
> >>>          container and
> >>>           > >     mounting /var/run/docker.sock from the host inside
> the
> >>>          container? That
> >>>           > >     looks like a bit of a hack but for testing it could
> be
> >>>          fine.
> >>>           > >
> >>>           > >      > notably, if the runner supports auto-scaling or
> >>>          similar non-trivial
> >>>           > >      > configurations, that would be difficult to manage
> >>>          from the SDK side.
> >>>           > >
> >>>           > >     You're right, it would be unfortunate if the SDK
> would
> >>>          have to deal with
> >>>           > >     spinning up SDK harness/backend containers. For
> non-trivial
> >>>           > >     configurations it would probably require an extended
> >>>          protocol.
> >>>           > >
> >>>           > >      > Option 4) We are also thinking about adding
> process
> >>>          based SDKHarness.
> >>>           > >      > This will avoid docker in docker scenario.
> >>>           > >
> >>>           > >     Actually, I had started implementing a process-based
> >>>          SDK harness but
> >>>           > >     figured it might be impractical because it doubles
> the
> >>>          execution path
> >>>           > >     for UDF code and potentially doesn't work with custom
> >>>          dependencies.
> >>>           > >
> >>>           > >      > Process based SDKHarness also has other
> applications
> >>>          and might be
> >>>           > >      > desirable in some of the production use cases.
> >>>           > >
> >>>           > >     True. Some users might want something more
> lightweight.
> >>>           > >
> >>>           >
> >>>           > --
> >>>           > Max
> >>>
> >>
> >> --
> >> Max
>
> --
> Max
>

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

Understood, so that's a generalized abstraction for creating RPC-based 
services that manage SDK harnesses. (What we discussed as "external" in 
the other thread). Would prefer this REST-based, since this makes 
interfacing with other systems easier. So probably a shell script would 
already suffice.

On 27.08.18 11:23, Robert Bradshaw wrote:
> I mean that rather than a command line (or docker image) a URL is
> given that's a GRPC (or REST or ...) endpoint that's invoked to pass
> what would have been passed by command line arguments (e.g. the FnAPI
> control plane and logging endpoints).
> 
> This could be implemented as a script that goes and makes the call and
> exits, but I think this would be common enough it'd be worth building
> in, and also useful enough for testing that it should be very
> lightweight.
> On Mon, Aug 27, 2018 at 10:51 AM Maximilian Michels <mx...@apache.org> wrote:
>>
>> Robert, just to be clear about the "callback" proposal. Do you mean that
>> the process startup script listens for an RPC from the Runner to bring
>> up SDK harnesses as needed?
>>
>> I agree this would be helpful to know the required parameter, e.g. you
>> mentioned the Fn Api network configuration.
>>
>> On 23.08.18 17:07, Robert Bradshaw wrote:
>>> On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <mx...@apache.org> wrote:
>>>>
>>>>    > Going down this path may start to get fairly involved, with an almost
>>>>    > endless list of features that could be requested. Instead, I would
>>>>    > suggest we keep process-based execution very simple, and specify bash
>>>>    > script (that sets up the environment and whatever else one may want to
>>>>    > do) as the command line invocation.
>>>>
>>>> Fair point. At the least, we will have to transfer the shell script to
>>>> the nodes. Anything else is up to the script.
>>>>
>>>>    > I would also think it'd be really valuable to provide a "callback"
>>>>    > environment, where an RPC call is made to trigger worker creation
>>>>    > (deletion?), passing the requisite parameters (e.g. the fn api
>>>>    > endpoints).
>>>>
>>>> Aren't you making up more features now? :) Couldn't this be also handled
>>>> by the shell script?
>>>
>>> Good point :). I still think it'd be nice to make this option more
>>> explicit, as it doesn't even require starting up (or managing) a
>>> subprocess.
>>>
>>>> On 23.08.18 14:13, Robert Bradshaw wrote:
>>>>> On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
>>>>>>
>>>>>> Big +1. Process-based execution should be simple to reason about for
>>>>>> users.
>>>>>
>>>>> +1. In fact, this is exactly what the Python local job server does,
>>>>> with running Docker simply being a particular command line that's
>>>>> passed down here.
>>>>>
>>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
>>>>>
>>>>>> The implementation should not be too involved. The user has to
>>>>>> ensure the environment is suitable for process-based execution.
>>>>>>
>>>>>> There are some minor features that we should support:
>>>>>>
>>>>>> - Activating a virtual environment for Python / Adding pre-installed
>>>>>> libraries to the classpath
>>>>>>
>>>>>> - Staging libraries, similarly to the boot code for Docker
>>>>>
>>>>> Going down this path may start to get fairly involved, with an almost
>>>>> endless list of features that could be requested. Instead, I would
>>>>> suggest we keep process-based execution very simple, and specify bash
>>>>> script (that sets up the environment and whatever else one may want to
>>>>> do) as the command line invocation. We could even provide a couple of
>>>>> these. (The arguments to pass should be configurable).
>>>>>
>>>>> I would also think it'd be really valuable to provide a "callback"
>>>>> environment, where an RPC call is made to trigger worker creation
>>>>> (deletion?), passing the requisite parameters (e.g. the fn api
>>>>> endpoints). This could be useful both in a distributed system (where
>>>>> it may be desirable for an external entity to actually start up the
>>>>> workers) or for debugging/testing (where one could call into the same
>>>>> process that submitted the job, which would execute workers on
>>>>> separate threads with an already set up environment).
>>>>>
>>>>>> On 22.08.18 07:49, Henning Rohde wrote:
>>>>>>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
>>>>>>> such as custom command line (defined in the environment proto) rooted at
>>>>>>> the base of the provided artifacts and either passed the same arguments
>>>>>>> or defined in the container contract or made available through
>>>>>>> substitution. That way, all the restrictions/assumptions of the
>>>>>>> execution environment become implicit and runner/deployment dependent.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
>>>>>>> <ma...@google.com>> wrote:
>>>>>>>
>>>>>>>        I believe supporting a simple Process environment makes sense. It
>>>>>>>        would be best if we didn't make the Process route solve all the
>>>>>>>        problems that Docker solves for us. In my opinion we should limit
>>>>>>>        the Process route to assume that the execution environment:
>>>>>>>        * has all dependencies and libraries installed
>>>>>>>        * is of a compatible machine architecture
>>>>>>>        * doesn't require special networking rules to be setup
>>>>>>>
>>>>>>>        Any other suggestions for reasonable limits on a Process environment?
>>>>>>>
>>>>>>>        On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
>>>>>>>        <ma...@gmail.com>> wrote:
>>>>>>>
>>>>>>>            It is also worth to mention that apart of the
>>>>>>>            testing/development use
>>>>>>>            case there is also the case of supporting people running in Hadoop
>>>>>>>            distributions. There are two extra reasons to want a process based
>>>>>>>            version: (1) Some Hadoop distributions run in machines with
>>>>>>>            really old
>>>>>>>            kernels where docker support is limited or nonexistent (yes, some of
>>>>>>>            those run on kernel 2.6!) and (2) Ops people may be reticent to the
>>>>>>>            additional operational overhead of enabling docker in their
>>>>>>>            clusters.
>>>>>>>            On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
>>>>>>>            <mxm@apache.org <ma...@apache.org>> wrote:
>>>>>>>             >
>>>>>>>             > Thanks Henning and Thomas. It looks like
>>>>>>>             >
>>>>>>>             > a) we want to keep the Docker Job Server Docker container and
>>>>>>>            rely on
>>>>>>>             > spinning up "sibling" SDK harness containers via the Docker
>>>>>>>            socket. This
>>>>>>>             > should require little changes to the Runner code.
>>>>>>>             >
>>>>>>>             > b) have the InProcess SDK harness as an alternative way to
>>>>>>>            running user
>>>>>>>             > code. This can be done independently of a).
>>>>>>>             >
>>>>>>>             > Thomas, let's sync today on the InProcess SDK harness. I've
>>>>>>>            created a
>>>>>>>             > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>>>>>>>             >
>>>>>>>             > Cheers,
>>>>>>>             > Max
>>>>>>>             >
>>>>>>>             > On 21.08.18 00:35, Thomas Weise wrote:
>>>>>>>             > > The original objective was to make test/development easier
>>>>>>>            (which I
>>>>>>>             > > think is super important for user experience with portable
>>>>>>>            runner).
>>>>>>>             > >
>>>>>>>             > >  From first hand experience I can confirm that dealing with
>>>>>>>            Flink
>>>>>>>             > > clusters and Docker containers for local setup is a
>>>>>>>            significant hurdle
>>>>>>>             > > for Python developers.
>>>>>>>             > >
>>>>>>>             > > To simplify using Flink in embedded mode, the (direct)
>>>>>>>            process based SDK
>>>>>>>             > > harness would be a good option, especially when it can be
>>>>>>>            linked to the
>>>>>>>             > > same virtualenv that developers have already setup,
>>>>>>>            eliminating extra
>>>>>>>             > > packaging/deployment steps.
>>>>>>>             > >
>>>>>>>             > > Max, I would be interested to sync up on what your thoughts are
>>>>>>>             > > regarding that option since you mention you also started to
>>>>>>>            work on it
>>>>>>>             > > (see previous discussion [1], not sure if there is a JIRA
>>>>>>>            for it yet).
>>>>>>>             > > Internally we are planning to use a direct SDK harness
>>>>>>>            process instead
>>>>>>>             > > of Docker containers. For our specific needs it will works
>>>>>>>            equally well
>>>>>>>             > > for development and production, including future plans to
>>>>>>>            deploy Flink
>>>>>>>             > > TMs via Kubernetes.
>>>>>>>             > >
>>>>>>>             > > Thanks,
>>>>>>>             > > Thomas
>>>>>>>             > >
>>>>>>>             > > [1]
>>>>>>>             > >
>>>>>>>            https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
>>>>>>>             > >
>>>>>>>             > >
>>>>>>>             > >
>>>>>>>             > >
>>>>>>>             > >
>>>>>>>             > >
>>>>>>>             > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
>>>>>>>            <mxm@apache.org <ma...@apache.org>
>>>>>>>             > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
>>>>>>>             > >
>>>>>>>             > >     Thanks for your suggestions. Please see below.
>>>>>>>             > >
>>>>>>>             > >      > Option 3) would be to map in the docker binary and
>>>>>>>            socket to allow
>>>>>>>             > >      > the containerized Flink job server to start
>>>>>>>            "sibling" containers on
>>>>>>>             > >      > the host.
>>>>>>>             > >
>>>>>>>             > >     Do you mean packaging Docker inside the Job Server
>>>>>>>            container and
>>>>>>>             > >     mounting /var/run/docker.sock from the host inside the
>>>>>>>            container? That
>>>>>>>             > >     looks like a bit of a hack but for testing it could be
>>>>>>>            fine.
>>>>>>>             > >
>>>>>>>             > >      > notably, if the runner supports auto-scaling or
>>>>>>>            similar non-trivial
>>>>>>>             > >      > configurations, that would be difficult to manage
>>>>>>>            from the SDK side.
>>>>>>>             > >
>>>>>>>             > >     You're right, it would be unfortunate if the SDK would
>>>>>>>            have to deal with
>>>>>>>             > >     spinning up SDK harness/backend containers. For non-trivial
>>>>>>>             > >     configurations it would probably require an extended
>>>>>>>            protocol.
>>>>>>>             > >
>>>>>>>             > >      > Option 4) We are also thinking about adding process
>>>>>>>            based SDKHarness.
>>>>>>>             > >      > This will avoid docker in docker scenario.
>>>>>>>             > >
>>>>>>>             > >     Actually, I had started implementing a process-based
>>>>>>>            SDK harness but
>>>>>>>             > >     figured it might be impractical because it doubles the
>>>>>>>            execution path
>>>>>>>             > >     for UDF code and potentially doesn't work with custom
>>>>>>>            dependencies.
>>>>>>>             > >
>>>>>>>             > >      > Process based SDKHarness also has other applications
>>>>>>>            and might be
>>>>>>>             > >      > desirable in some of the production use cases.
>>>>>>>             > >
>>>>>>>             > >     True. Some users might want something more lightweight.
>>>>>>>             > >
>>>>>>>             >
>>>>>>>             > --
>>>>>>>             > Max
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Max
>>>>
>>>> --
>>>> Max
>>
>> --
>> Max

-- 
Max

Re: Bootstrapping Beam's Job Server

Posted by Robert Bradshaw <ro...@google.com>.

I mean that rather than a command line (or docker image) a URL is
given that's a GRPC (or REST or ...) endpoint that's invoked to pass
what would have been passed by command line arguments (e.g. the FnAPI
control plane and logging endpoints).

This could be implemented as a script that goes and makes the call and
exits, but I think this would be common enough it'd be worth building
in, and also useful enough for testing that it should be very
lightweight.
On Mon, Aug 27, 2018 at 10:51 AM Maximilian Michels <mx...@apache.org> wrote:
>
> Robert, just to be clear about the "callback" proposal. Do you mean that
> the process startup script listens for an RPC from the Runner to bring
> up SDK harnesses as needed?
>
> I agree this would be helpful to know the required parameter, e.g. you
> mentioned the Fn Api network configuration.
>
> On 23.08.18 17:07, Robert Bradshaw wrote:
> > On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <mx...@apache.org> wrote:
> >>
> >>   > Going down this path may start to get fairly involved, with an almost
> >>   > endless list of features that could be requested. Instead, I would
> >>   > suggest we keep process-based execution very simple, and specify bash
> >>   > script (that sets up the environment and whatever else one may want to
> >>   > do) as the command line invocation.
> >>
> >> Fair point. At the least, we will have to transfer the shell script to
> >> the nodes. Anything else is up to the script.
> >>
> >>   > I would also think it'd be really valuable to provide a "callback"
> >>   > environment, where an RPC call is made to trigger worker creation
> >>   > (deletion?), passing the requisite parameters (e.g. the fn api
> >>   > endpoints).
> >>
> >> Aren't you making up more features now? :) Couldn't this be also handled
> >> by the shell script?
> >
> > Good point :). I still think it'd be nice to make this option more
> > explicit, as it doesn't even require starting up (or managing) a
> > subprocess.
> >
> >> On 23.08.18 14:13, Robert Bradshaw wrote:
> >>> On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
> >>>>
> >>>> Big +1. Process-based execution should be simple to reason about for
> >>>> users.
> >>>
> >>> +1. In fact, this is exactly what the Python local job server does,
> >>> with running Docker simply being a particular command line that's
> >>> passed down here.
> >>>
> >>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
> >>>
> >>>> The implementation should not be too involved. The user has to
> >>>> ensure the environment is suitable for process-based execution.
> >>>>
> >>>> There are some minor features that we should support:
> >>>>
> >>>> - Activating a virtual environment for Python / Adding pre-installed
> >>>> libraries to the classpath
> >>>>
> >>>> - Staging libraries, similarly to the boot code for Docker
> >>>
> >>> Going down this path may start to get fairly involved, with an almost
> >>> endless list of features that could be requested. Instead, I would
> >>> suggest we keep process-based execution very simple, and specify bash
> >>> script (that sets up the environment and whatever else one may want to
> >>> do) as the command line invocation. We could even provide a couple of
> >>> these. (The arguments to pass should be configurable).
> >>>
> >>> I would also think it'd be really valuable to provide a "callback"
> >>> environment, where an RPC call is made to trigger worker creation
> >>> (deletion?), passing the requisite parameters (e.g. the fn api
> >>> endpoints). This could be useful both in a distributed system (where
> >>> it may be desirable for an external entity to actually start up the
> >>> workers) or for debugging/testing (where one could call into the same
> >>> process that submitted the job, which would execute workers on
> >>> separate threads with an already set up environment).
> >>>
> >>>> On 22.08.18 07:49, Henning Rohde wrote:
> >>>>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
> >>>>> such as custom command line (defined in the environment proto) rooted at
> >>>>> the base of the provided artifacts and either passed the same arguments
> >>>>> or defined in the container contract or made available through
> >>>>> substitution. That way, all the restrictions/assumptions of the
> >>>>> execution environment become implicit and runner/deployment dependent.
> >>>>>
> >>>>>
> >>>>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
> >>>>> <ma...@google.com>> wrote:
> >>>>>
> >>>>>       I believe supporting a simple Process environment makes sense. It
> >>>>>       would be best if we didn't make the Process route solve all the
> >>>>>       problems that Docker solves for us. In my opinion we should limit
> >>>>>       the Process route to assume that the execution environment:
> >>>>>       * has all dependencies and libraries installed
> >>>>>       * is of a compatible machine architecture
> >>>>>       * doesn't require special networking rules to be setup
> >>>>>
> >>>>>       Any other suggestions for reasonable limits on a Process environment?
> >>>>>
> >>>>>       On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
> >>>>>       <ma...@gmail.com>> wrote:
> >>>>>
> >>>>>           It is also worth to mention that apart of the
> >>>>>           testing/development use
> >>>>>           case there is also the case of supporting people running in Hadoop
> >>>>>           distributions. There are two extra reasons to want a process based
> >>>>>           version: (1) Some Hadoop distributions run in machines with
> >>>>>           really old
> >>>>>           kernels where docker support is limited or nonexistent (yes, some of
> >>>>>           those run on kernel 2.6!) and (2) Ops people may be reticent to the
> >>>>>           additional operational overhead of enabling docker in their
> >>>>>           clusters.
> >>>>>           On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
> >>>>>           <mxm@apache.org <ma...@apache.org>> wrote:
> >>>>>            >
> >>>>>            > Thanks Henning and Thomas. It looks like
> >>>>>            >
> >>>>>            > a) we want to keep the Docker Job Server Docker container and
> >>>>>           rely on
> >>>>>            > spinning up "sibling" SDK harness containers via the Docker
> >>>>>           socket. This
> >>>>>            > should require little changes to the Runner code.
> >>>>>            >
> >>>>>            > b) have the InProcess SDK harness as an alternative way to
> >>>>>           running user
> >>>>>            > code. This can be done independently of a).
> >>>>>            >
> >>>>>            > Thomas, let's sync today on the InProcess SDK harness. I've
> >>>>>           created a
> >>>>>            > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >>>>>            >
> >>>>>            > Cheers,
> >>>>>            > Max
> >>>>>            >
> >>>>>            > On 21.08.18 00:35, Thomas Weise wrote:
> >>>>>            > > The original objective was to make test/development easier
> >>>>>           (which I
> >>>>>            > > think is super important for user experience with portable
> >>>>>           runner).
> >>>>>            > >
> >>>>>            > >  From first hand experience I can confirm that dealing with
> >>>>>           Flink
> >>>>>            > > clusters and Docker containers for local setup is a
> >>>>>           significant hurdle
> >>>>>            > > for Python developers.
> >>>>>            > >
> >>>>>            > > To simplify using Flink in embedded mode, the (direct)
> >>>>>           process based SDK
> >>>>>            > > harness would be a good option, especially when it can be
> >>>>>           linked to the
> >>>>>            > > same virtualenv that developers have already setup,
> >>>>>           eliminating extra
> >>>>>            > > packaging/deployment steps.
> >>>>>            > >
> >>>>>            > > Max, I would be interested to sync up on what your thoughts are
> >>>>>            > > regarding that option since you mention you also started to
> >>>>>           work on it
> >>>>>            > > (see previous discussion [1], not sure if there is a JIRA
> >>>>>           for it yet).
> >>>>>            > > Internally we are planning to use a direct SDK harness
> >>>>>           process instead
> >>>>>            > > of Docker containers. For our specific needs it will works
> >>>>>           equally well
> >>>>>            > > for development and production, including future plans to
> >>>>>           deploy Flink
> >>>>>            > > TMs via Kubernetes.
> >>>>>            > >
> >>>>>            > > Thanks,
> >>>>>            > > Thomas
> >>>>>            > >
> >>>>>            > > [1]
> >>>>>            > >
> >>>>>           https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >>>>>            > >
> >>>>>            > >
> >>>>>            > >
> >>>>>            > >
> >>>>>            > >
> >>>>>            > >
> >>>>>            > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
> >>>>>           <mxm@apache.org <ma...@apache.org>
> >>>>>            > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
> >>>>>            > >
> >>>>>            > >     Thanks for your suggestions. Please see below.
> >>>>>            > >
> >>>>>            > >      > Option 3) would be to map in the docker binary and
> >>>>>           socket to allow
> >>>>>            > >      > the containerized Flink job server to start
> >>>>>           "sibling" containers on
> >>>>>            > >      > the host.
> >>>>>            > >
> >>>>>            > >     Do you mean packaging Docker inside the Job Server
> >>>>>           container and
> >>>>>            > >     mounting /var/run/docker.sock from the host inside the
> >>>>>           container? That
> >>>>>            > >     looks like a bit of a hack but for testing it could be
> >>>>>           fine.
> >>>>>            > >
> >>>>>            > >      > notably, if the runner supports auto-scaling or
> >>>>>           similar non-trivial
> >>>>>            > >      > configurations, that would be difficult to manage
> >>>>>           from the SDK side.
> >>>>>            > >
> >>>>>            > >     You're right, it would be unfortunate if the SDK would
> >>>>>           have to deal with
> >>>>>            > >     spinning up SDK harness/backend containers. For non-trivial
> >>>>>            > >     configurations it would probably require an extended
> >>>>>           protocol.
> >>>>>            > >
> >>>>>            > >      > Option 4) We are also thinking about adding process
> >>>>>           based SDKHarness.
> >>>>>            > >      > This will avoid docker in docker scenario.
> >>>>>            > >
> >>>>>            > >     Actually, I had started implementing a process-based
> >>>>>           SDK harness but
> >>>>>            > >     figured it might be impractical because it doubles the
> >>>>>           execution path
> >>>>>            > >     for UDF code and potentially doesn't work with custom
> >>>>>           dependencies.
> >>>>>            > >
> >>>>>            > >      > Process based SDKHarness also has other applications
> >>>>>           and might be
> >>>>>            > >      > desirable in some of the production use cases.
> >>>>>            > >
> >>>>>            > >     True. Some users might want something more lightweight.
> >>>>>            > >
> >>>>>            >
> >>>>>            > --
> >>>>>            > Max
> >>>>>
> >>>>
> >>>> --
> >>>> Max
> >>
> >> --
> >> Max
>
> --
> Max

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

Robert, just to be clear about the "callback" proposal. Do you mean that 
the process startup script listens for an RPC from the Runner to bring 
up SDK harnesses as needed?

I agree this would be helpful to know the required parameter, e.g. you 
mentioned the Fn Api network configuration.

On 23.08.18 17:07, Robert Bradshaw wrote:
> On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <mx...@apache.org> wrote:
>>
>>   > Going down this path may start to get fairly involved, with an almost
>>   > endless list of features that could be requested. Instead, I would
>>   > suggest we keep process-based execution very simple, and specify bash
>>   > script (that sets up the environment and whatever else one may want to
>>   > do) as the command line invocation.
>>
>> Fair point. At the least, we will have to transfer the shell script to
>> the nodes. Anything else is up to the script.
>>
>>   > I would also think it'd be really valuable to provide a "callback"
>>   > environment, where an RPC call is made to trigger worker creation
>>   > (deletion?), passing the requisite parameters (e.g. the fn api
>>   > endpoints).
>>
>> Aren't you making up more features now? :) Couldn't this be also handled
>> by the shell script?
> 
> Good point :). I still think it'd be nice to make this option more
> explicit, as it doesn't even require starting up (or managing) a
> subprocess.
> 
>> On 23.08.18 14:13, Robert Bradshaw wrote:
>>> On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
>>>>
>>>> Big +1. Process-based execution should be simple to reason about for
>>>> users.
>>>
>>> +1. In fact, this is exactly what the Python local job server does,
>>> with running Docker simply being a particular command line that's
>>> passed down here.
>>>
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
>>>
>>>> The implementation should not be too involved. The user has to
>>>> ensure the environment is suitable for process-based execution.
>>>>
>>>> There are some minor features that we should support:
>>>>
>>>> - Activating a virtual environment for Python / Adding pre-installed
>>>> libraries to the classpath
>>>>
>>>> - Staging libraries, similarly to the boot code for Docker
>>>
>>> Going down this path may start to get fairly involved, with an almost
>>> endless list of features that could be requested. Instead, I would
>>> suggest we keep process-based execution very simple, and specify bash
>>> script (that sets up the environment and whatever else one may want to
>>> do) as the command line invocation. We could even provide a couple of
>>> these. (The arguments to pass should be configurable).
>>>
>>> I would also think it'd be really valuable to provide a "callback"
>>> environment, where an RPC call is made to trigger worker creation
>>> (deletion?), passing the requisite parameters (e.g. the fn api
>>> endpoints). This could be useful both in a distributed system (where
>>> it may be desirable for an external entity to actually start up the
>>> workers) or for debugging/testing (where one could call into the same
>>> process that submitted the job, which would execute workers on
>>> separate threads with an already set up environment).
>>>
>>>> On 22.08.18 07:49, Henning Rohde wrote:
>>>>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
>>>>> such as custom command line (defined in the environment proto) rooted at
>>>>> the base of the provided artifacts and either passed the same arguments
>>>>> or defined in the container contract or made available through
>>>>> substitution. That way, all the restrictions/assumptions of the
>>>>> execution environment become implicit and runner/deployment dependent.
>>>>>
>>>>>
>>>>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
>>>>> <ma...@google.com>> wrote:
>>>>>
>>>>>       I believe supporting a simple Process environment makes sense. It
>>>>>       would be best if we didn't make the Process route solve all the
>>>>>       problems that Docker solves for us. In my opinion we should limit
>>>>>       the Process route to assume that the execution environment:
>>>>>       * has all dependencies and libraries installed
>>>>>       * is of a compatible machine architecture
>>>>>       * doesn't require special networking rules to be setup
>>>>>
>>>>>       Any other suggestions for reasonable limits on a Process environment?
>>>>>
>>>>>       On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
>>>>>       <ma...@gmail.com>> wrote:
>>>>>
>>>>>           It is also worth to mention that apart of the
>>>>>           testing/development use
>>>>>           case there is also the case of supporting people running in Hadoop
>>>>>           distributions. There are two extra reasons to want a process based
>>>>>           version: (1) Some Hadoop distributions run in machines with
>>>>>           really old
>>>>>           kernels where docker support is limited or nonexistent (yes, some of
>>>>>           those run on kernel 2.6!) and (2) Ops people may be reticent to the
>>>>>           additional operational overhead of enabling docker in their
>>>>>           clusters.
>>>>>           On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
>>>>>           <mxm@apache.org <ma...@apache.org>> wrote:
>>>>>            >
>>>>>            > Thanks Henning and Thomas. It looks like
>>>>>            >
>>>>>            > a) we want to keep the Docker Job Server Docker container and
>>>>>           rely on
>>>>>            > spinning up "sibling" SDK harness containers via the Docker
>>>>>           socket. This
>>>>>            > should require little changes to the Runner code.
>>>>>            >
>>>>>            > b) have the InProcess SDK harness as an alternative way to
>>>>>           running user
>>>>>            > code. This can be done independently of a).
>>>>>            >
>>>>>            > Thomas, let's sync today on the InProcess SDK harness. I've
>>>>>           created a
>>>>>            > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>>>>>            >
>>>>>            > Cheers,
>>>>>            > Max
>>>>>            >
>>>>>            > On 21.08.18 00:35, Thomas Weise wrote:
>>>>>            > > The original objective was to make test/development easier
>>>>>           (which I
>>>>>            > > think is super important for user experience with portable
>>>>>           runner).
>>>>>            > >
>>>>>            > >  From first hand experience I can confirm that dealing with
>>>>>           Flink
>>>>>            > > clusters and Docker containers for local setup is a
>>>>>           significant hurdle
>>>>>            > > for Python developers.
>>>>>            > >
>>>>>            > > To simplify using Flink in embedded mode, the (direct)
>>>>>           process based SDK
>>>>>            > > harness would be a good option, especially when it can be
>>>>>           linked to the
>>>>>            > > same virtualenv that developers have already setup,
>>>>>           eliminating extra
>>>>>            > > packaging/deployment steps.
>>>>>            > >
>>>>>            > > Max, I would be interested to sync up on what your thoughts are
>>>>>            > > regarding that option since you mention you also started to
>>>>>           work on it
>>>>>            > > (see previous discussion [1], not sure if there is a JIRA
>>>>>           for it yet).
>>>>>            > > Internally we are planning to use a direct SDK harness
>>>>>           process instead
>>>>>            > > of Docker containers. For our specific needs it will works
>>>>>           equally well
>>>>>            > > for development and production, including future plans to
>>>>>           deploy Flink
>>>>>            > > TMs via Kubernetes.
>>>>>            > >
>>>>>            > > Thanks,
>>>>>            > > Thomas
>>>>>            > >
>>>>>            > > [1]
>>>>>            > >
>>>>>           https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > >
>>>>>            > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
>>>>>           <mxm@apache.org <ma...@apache.org>
>>>>>            > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
>>>>>            > >
>>>>>            > >     Thanks for your suggestions. Please see below.
>>>>>            > >
>>>>>            > >      > Option 3) would be to map in the docker binary and
>>>>>           socket to allow
>>>>>            > >      > the containerized Flink job server to start
>>>>>           "sibling" containers on
>>>>>            > >      > the host.
>>>>>            > >
>>>>>            > >     Do you mean packaging Docker inside the Job Server
>>>>>           container and
>>>>>            > >     mounting /var/run/docker.sock from the host inside the
>>>>>           container? That
>>>>>            > >     looks like a bit of a hack but for testing it could be
>>>>>           fine.
>>>>>            > >
>>>>>            > >      > notably, if the runner supports auto-scaling or
>>>>>           similar non-trivial
>>>>>            > >      > configurations, that would be difficult to manage
>>>>>           from the SDK side.
>>>>>            > >
>>>>>            > >     You're right, it would be unfortunate if the SDK would
>>>>>           have to deal with
>>>>>            > >     spinning up SDK harness/backend containers. For non-trivial
>>>>>            > >     configurations it would probably require an extended
>>>>>           protocol.
>>>>>            > >
>>>>>            > >      > Option 4) We are also thinking about adding process
>>>>>           based SDKHarness.
>>>>>            > >      > This will avoid docker in docker scenario.
>>>>>            > >
>>>>>            > >     Actually, I had started implementing a process-based
>>>>>           SDK harness but
>>>>>            > >     figured it might be impractical because it doubles the
>>>>>           execution path
>>>>>            > >     for UDF code and potentially doesn't work with custom
>>>>>           dependencies.
>>>>>            > >
>>>>>            > >      > Process based SDKHarness also has other applications
>>>>>           and might be
>>>>>            > >      > desirable in some of the production use cases.
>>>>>            > >
>>>>>            > >     True. Some users might want something more lightweight.
>>>>>            > >
>>>>>            >
>>>>>            > --
>>>>>            > Max
>>>>>
>>>>
>>>> --
>>>> Max
>>
>> --
>> Max

-- 
Max

Re: Bootstrapping Beam's Job Server

Posted by Robert Bradshaw <ro...@google.com>.

On Thu, Aug 23, 2018 at 3:47 PM Maximilian Michels <mx...@apache.org> wrote:
>
>  > Going down this path may start to get fairly involved, with an almost
>  > endless list of features that could be requested. Instead, I would
>  > suggest we keep process-based execution very simple, and specify bash
>  > script (that sets up the environment and whatever else one may want to
>  > do) as the command line invocation.
>
> Fair point. At the least, we will have to transfer the shell script to
> the nodes. Anything else is up to the script.
>
>  > I would also think it'd be really valuable to provide a "callback"
>  > environment, where an RPC call is made to trigger worker creation
>  > (deletion?), passing the requisite parameters (e.g. the fn api
>  > endpoints).
>
> Aren't you making up more features now? :) Couldn't this be also handled
> by the shell script?

Good point :). I still think it'd be nice to make this option more
explicit, as it doesn't even require starting up (or managing) a
subprocess.

> On 23.08.18 14:13, Robert Bradshaw wrote:
> > On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
> >>
> >> Big +1. Process-based execution should be simple to reason about for
> >> users.
> >
> > +1. In fact, this is exactly what the Python local job server does,
> > with running Docker simply being a particular command line that's
> > passed down here.
> >
> > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
> >
> >> The implementation should not be too involved. The user has to
> >> ensure the environment is suitable for process-based execution.
> >>
> >> There are some minor features that we should support:
> >>
> >> - Activating a virtual environment for Python / Adding pre-installed
> >> libraries to the classpath
> >>
> >> - Staging libraries, similarly to the boot code for Docker
> >
> > Going down this path may start to get fairly involved, with an almost
> > endless list of features that could be requested. Instead, I would
> > suggest we keep process-based execution very simple, and specify bash
> > script (that sets up the environment and whatever else one may want to
> > do) as the command line invocation. We could even provide a couple of
> > these. (The arguments to pass should be configurable).
> >
> > I would also think it'd be really valuable to provide a "callback"
> > environment, where an RPC call is made to trigger worker creation
> > (deletion?), passing the requisite parameters (e.g. the fn api
> > endpoints). This could be useful both in a distributed system (where
> > it may be desirable for an external entity to actually start up the
> > workers) or for debugging/testing (where one could call into the same
> > process that submitted the job, which would execute workers on
> > separate threads with an already set up environment).
> >
> >> On 22.08.18 07:49, Henning Rohde wrote:
> >>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
> >>> such as custom command line (defined in the environment proto) rooted at
> >>> the base of the provided artifacts and either passed the same arguments
> >>> or defined in the container contract or made available through
> >>> substitution. That way, all the restrictions/assumptions of the
> >>> execution environment become implicit and runner/deployment dependent.
> >>>
> >>>
> >>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
> >>> <ma...@google.com>> wrote:
> >>>
> >>>      I believe supporting a simple Process environment makes sense. It
> >>>      would be best if we didn't make the Process route solve all the
> >>>      problems that Docker solves for us. In my opinion we should limit
> >>>      the Process route to assume that the execution environment:
> >>>      * has all dependencies and libraries installed
> >>>      * is of a compatible machine architecture
> >>>      * doesn't require special networking rules to be setup
> >>>
> >>>      Any other suggestions for reasonable limits on a Process environment?
> >>>
> >>>      On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
> >>>      <ma...@gmail.com>> wrote:
> >>>
> >>>          It is also worth to mention that apart of the
> >>>          testing/development use
> >>>          case there is also the case of supporting people running in Hadoop
> >>>          distributions. There are two extra reasons to want a process based
> >>>          version: (1) Some Hadoop distributions run in machines with
> >>>          really old
> >>>          kernels where docker support is limited or nonexistent (yes, some of
> >>>          those run on kernel 2.6!) and (2) Ops people may be reticent to the
> >>>          additional operational overhead of enabling docker in their
> >>>          clusters.
> >>>          On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
> >>>          <mxm@apache.org <ma...@apache.org>> wrote:
> >>>           >
> >>>           > Thanks Henning and Thomas. It looks like
> >>>           >
> >>>           > a) we want to keep the Docker Job Server Docker container and
> >>>          rely on
> >>>           > spinning up "sibling" SDK harness containers via the Docker
> >>>          socket. This
> >>>           > should require little changes to the Runner code.
> >>>           >
> >>>           > b) have the InProcess SDK harness as an alternative way to
> >>>          running user
> >>>           > code. This can be done independently of a).
> >>>           >
> >>>           > Thomas, let's sync today on the InProcess SDK harness. I've
> >>>          created a
> >>>           > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >>>           >
> >>>           > Cheers,
> >>>           > Max
> >>>           >
> >>>           > On 21.08.18 00:35, Thomas Weise wrote:
> >>>           > > The original objective was to make test/development easier
> >>>          (which I
> >>>           > > think is super important for user experience with portable
> >>>          runner).
> >>>           > >
> >>>           > >  From first hand experience I can confirm that dealing with
> >>>          Flink
> >>>           > > clusters and Docker containers for local setup is a
> >>>          significant hurdle
> >>>           > > for Python developers.
> >>>           > >
> >>>           > > To simplify using Flink in embedded mode, the (direct)
> >>>          process based SDK
> >>>           > > harness would be a good option, especially when it can be
> >>>          linked to the
> >>>           > > same virtualenv that developers have already setup,
> >>>          eliminating extra
> >>>           > > packaging/deployment steps.
> >>>           > >
> >>>           > > Max, I would be interested to sync up on what your thoughts are
> >>>           > > regarding that option since you mention you also started to
> >>>          work on it
> >>>           > > (see previous discussion [1], not sure if there is a JIRA
> >>>          for it yet).
> >>>           > > Internally we are planning to use a direct SDK harness
> >>>          process instead
> >>>           > > of Docker containers. For our specific needs it will works
> >>>          equally well
> >>>           > > for development and production, including future plans to
> >>>          deploy Flink
> >>>           > > TMs via Kubernetes.
> >>>           > >
> >>>           > > Thanks,
> >>>           > > Thomas
> >>>           > >
> >>>           > > [1]
> >>>           > >
> >>>          https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > >
> >>>           > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
> >>>          <mxm@apache.org <ma...@apache.org>
> >>>           > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
> >>>           > >
> >>>           > >     Thanks for your suggestions. Please see below.
> >>>           > >
> >>>           > >      > Option 3) would be to map in the docker binary and
> >>>          socket to allow
> >>>           > >      > the containerized Flink job server to start
> >>>          "sibling" containers on
> >>>           > >      > the host.
> >>>           > >
> >>>           > >     Do you mean packaging Docker inside the Job Server
> >>>          container and
> >>>           > >     mounting /var/run/docker.sock from the host inside the
> >>>          container? That
> >>>           > >     looks like a bit of a hack but for testing it could be
> >>>          fine.
> >>>           > >
> >>>           > >      > notably, if the runner supports auto-scaling or
> >>>          similar non-trivial
> >>>           > >      > configurations, that would be difficult to manage
> >>>          from the SDK side.
> >>>           > >
> >>>           > >     You're right, it would be unfortunate if the SDK would
> >>>          have to deal with
> >>>           > >     spinning up SDK harness/backend containers. For non-trivial
> >>>           > >     configurations it would probably require an extended
> >>>          protocol.
> >>>           > >
> >>>           > >      > Option 4) We are also thinking about adding process
> >>>          based SDKHarness.
> >>>           > >      > This will avoid docker in docker scenario.
> >>>           > >
> >>>           > >     Actually, I had started implementing a process-based
> >>>          SDK harness but
> >>>           > >     figured it might be impractical because it doubles the
> >>>          execution path
> >>>           > >     for UDF code and potentially doesn't work with custom
> >>>          dependencies.
> >>>           > >
> >>>           > >      > Process based SDKHarness also has other applications
> >>>          and might be
> >>>           > >      > desirable in some of the production use cases.
> >>>           > >
> >>>           > >     True. Some users might want something more lightweight.
> >>>           > >
> >>>           >
> >>>           > --
> >>>           > Max
> >>>
> >>
> >> --
> >> Max
>
> --
> Max

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

 > Going down this path may start to get fairly involved, with an almost
 > endless list of features that could be requested. Instead, I would
 > suggest we keep process-based execution very simple, and specify bash
 > script (that sets up the environment and whatever else one may want to
 > do) as the command line invocation.

Fair point. At the least, we will have to transfer the shell script to 
the nodes. Anything else is up to the script.

 > I would also think it'd be really valuable to provide a "callback"
 > environment, where an RPC call is made to trigger worker creation
 > (deletion?), passing the requisite parameters (e.g. the fn api
 > endpoints).

Aren't you making up more features now? :) Couldn't this be also handled 
by the shell script?

On 23.08.18 14:13, Robert Bradshaw wrote:
> On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
>>
>> Big +1. Process-based execution should be simple to reason about for
>> users.
> 
> +1. In fact, this is exactly what the Python local job server does,
> with running Docker simply being a particular command line that's
> passed down here.
> 
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py
> 
>> The implementation should not be too involved. The user has to
>> ensure the environment is suitable for process-based execution.
>>
>> There are some minor features that we should support:
>>
>> - Activating a virtual environment for Python / Adding pre-installed
>> libraries to the classpath
>>
>> - Staging libraries, similarly to the boot code for Docker
> 
> Going down this path may start to get fairly involved, with an almost
> endless list of features that could be requested. Instead, I would
> suggest we keep process-based execution very simple, and specify bash
> script (that sets up the environment and whatever else one may want to
> do) as the command line invocation. We could even provide a couple of
> these. (The arguments to pass should be configurable).
> 
> I would also think it'd be really valuable to provide a "callback"
> environment, where an RPC call is made to trigger worker creation
> (deletion?), passing the requisite parameters (e.g. the fn api
> endpoints). This could be useful both in a distributed system (where
> it may be desirable for an external entity to actually start up the
> workers) or for debugging/testing (where one could call into the same
> process that submitted the job, which would execute workers on
> separate threads with an already set up environment).
> 
>> On 22.08.18 07:49, Henning Rohde wrote:
>>> Agree with Luke. Perhaps something simple, prescriptive yet flexible,
>>> such as custom command line (defined in the environment proto) rooted at
>>> the base of the provided artifacts and either passed the same arguments
>>> or defined in the container contract or made available through
>>> substitution. That way, all the restrictions/assumptions of the
>>> execution environment become implicit and runner/deployment dependent.
>>>
>>>
>>> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
>>> <ma...@google.com>> wrote:
>>>
>>>      I believe supporting a simple Process environment makes sense. It
>>>      would be best if we didn't make the Process route solve all the
>>>      problems that Docker solves for us. In my opinion we should limit
>>>      the Process route to assume that the execution environment:
>>>      * has all dependencies and libraries installed
>>>      * is of a compatible machine architecture
>>>      * doesn't require special networking rules to be setup
>>>
>>>      Any other suggestions for reasonable limits on a Process environment?
>>>
>>>      On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
>>>      <ma...@gmail.com>> wrote:
>>>
>>>          It is also worth to mention that apart of the
>>>          testing/development use
>>>          case there is also the case of supporting people running in Hadoop
>>>          distributions. There are two extra reasons to want a process based
>>>          version: (1) Some Hadoop distributions run in machines with
>>>          really old
>>>          kernels where docker support is limited or nonexistent (yes, some of
>>>          those run on kernel 2.6!) and (2) Ops people may be reticent to the
>>>          additional operational overhead of enabling docker in their
>>>          clusters.
>>>          On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
>>>          <mxm@apache.org <ma...@apache.org>> wrote:
>>>           >
>>>           > Thanks Henning and Thomas. It looks like
>>>           >
>>>           > a) we want to keep the Docker Job Server Docker container and
>>>          rely on
>>>           > spinning up "sibling" SDK harness containers via the Docker
>>>          socket. This
>>>           > should require little changes to the Runner code.
>>>           >
>>>           > b) have the InProcess SDK harness as an alternative way to
>>>          running user
>>>           > code. This can be done independently of a).
>>>           >
>>>           > Thomas, let's sync today on the InProcess SDK harness. I've
>>>          created a
>>>           > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>>>           >
>>>           > Cheers,
>>>           > Max
>>>           >
>>>           > On 21.08.18 00:35, Thomas Weise wrote:
>>>           > > The original objective was to make test/development easier
>>>          (which I
>>>           > > think is super important for user experience with portable
>>>          runner).
>>>           > >
>>>           > >  From first hand experience I can confirm that dealing with
>>>          Flink
>>>           > > clusters and Docker containers for local setup is a
>>>          significant hurdle
>>>           > > for Python developers.
>>>           > >
>>>           > > To simplify using Flink in embedded mode, the (direct)
>>>          process based SDK
>>>           > > harness would be a good option, especially when it can be
>>>          linked to the
>>>           > > same virtualenv that developers have already setup,
>>>          eliminating extra
>>>           > > packaging/deployment steps.
>>>           > >
>>>           > > Max, I would be interested to sync up on what your thoughts are
>>>           > > regarding that option since you mention you also started to
>>>          work on it
>>>           > > (see previous discussion [1], not sure if there is a JIRA
>>>          for it yet).
>>>           > > Internally we are planning to use a direct SDK harness
>>>          process instead
>>>           > > of Docker containers. For our specific needs it will works
>>>          equally well
>>>           > > for development and production, including future plans to
>>>          deploy Flink
>>>           > > TMs via Kubernetes.
>>>           > >
>>>           > > Thanks,
>>>           > > Thomas
>>>           > >
>>>           > > [1]
>>>           > >
>>>          https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
>>>           > >
>>>           > >
>>>           > >
>>>           > >
>>>           > >
>>>           > >
>>>           > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
>>>          <mxm@apache.org <ma...@apache.org>
>>>           > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
>>>           > >
>>>           > >     Thanks for your suggestions. Please see below.
>>>           > >
>>>           > >      > Option 3) would be to map in the docker binary and
>>>          socket to allow
>>>           > >      > the containerized Flink job server to start
>>>          "sibling" containers on
>>>           > >      > the host.
>>>           > >
>>>           > >     Do you mean packaging Docker inside the Job Server
>>>          container and
>>>           > >     mounting /var/run/docker.sock from the host inside the
>>>          container? That
>>>           > >     looks like a bit of a hack but for testing it could be
>>>          fine.
>>>           > >
>>>           > >      > notably, if the runner supports auto-scaling or
>>>          similar non-trivial
>>>           > >      > configurations, that would be difficult to manage
>>>          from the SDK side.
>>>           > >
>>>           > >     You're right, it would be unfortunate if the SDK would
>>>          have to deal with
>>>           > >     spinning up SDK harness/backend containers. For non-trivial
>>>           > >     configurations it would probably require an extended
>>>          protocol.
>>>           > >
>>>           > >      > Option 4) We are also thinking about adding process
>>>          based SDKHarness.
>>>           > >      > This will avoid docker in docker scenario.
>>>           > >
>>>           > >     Actually, I had started implementing a process-based
>>>          SDK harness but
>>>           > >     figured it might be impractical because it doubles the
>>>          execution path
>>>           > >     for UDF code and potentially doesn't work with custom
>>>          dependencies.
>>>           > >
>>>           > >      > Process based SDKHarness also has other applications
>>>          and might be
>>>           > >      > desirable in some of the production use cases.
>>>           > >
>>>           > >     True. Some users might want something more lightweight.
>>>           > >
>>>           >
>>>           > --
>>>           > Max
>>>
>>
>> --
>> Max

-- 
Max

Re: Bootstrapping Beam's Job Server

Posted by Robert Bradshaw <ro...@google.com>.

On Thu, Aug 23, 2018 at 1:54 PM Maximilian Michels <mx...@apache.org> wrote:
>
> Big +1. Process-based execution should be simple to reason about for
> users.

+1. In fact, this is exactly what the Python local job server does,
with running Docker simply being a particular command line that's
passed down here.

https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/local_job_service_main.py

> The implementation should not be too involved. The user has to
> ensure the environment is suitable for process-based execution.
>
> There are some minor features that we should support:
>
> - Activating a virtual environment for Python / Adding pre-installed
> libraries to the classpath
>
> - Staging libraries, similarly to the boot code for Docker

Going down this path may start to get fairly involved, with an almost
endless list of features that could be requested. Instead, I would
suggest we keep process-based execution very simple, and specify bash
script (that sets up the environment and whatever else one may want to
do) as the command line invocation. We could even provide a couple of
these. (The arguments to pass should be configurable).

I would also think it'd be really valuable to provide a "callback"
environment, where an RPC call is made to trigger worker creation
(deletion?), passing the requisite parameters (e.g. the fn api
endpoints). This could be useful both in a distributed system (where
it may be desirable for an external entity to actually start up the
workers) or for debugging/testing (where one could call into the same
process that submitted the job, which would execute workers on
separate threads with an already set up environment).

> On 22.08.18 07:49, Henning Rohde wrote:
> > Agree with Luke. Perhaps something simple, prescriptive yet flexible,
> > such as custom command line (defined in the environment proto) rooted at
> > the base of the provided artifacts and either passed the same arguments
> > or defined in the container contract or made available through
> > substitution. That way, all the restrictions/assumptions of the
> > execution environment become implicit and runner/deployment dependent.
> >
> >
> > On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com
> > <ma...@google.com>> wrote:
> >
> >     I believe supporting a simple Process environment makes sense. It
> >     would be best if we didn't make the Process route solve all the
> >     problems that Docker solves for us. In my opinion we should limit
> >     the Process route to assume that the execution environment:
> >     * has all dependencies and libraries installed
> >     * is of a compatible machine architecture
> >     * doesn't require special networking rules to be setup
> >
> >     Any other suggestions for reasonable limits on a Process environment?
> >
> >     On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
> >     <ma...@gmail.com>> wrote:
> >
> >         It is also worth to mention that apart of the
> >         testing/development use
> >         case there is also the case of supporting people running in Hadoop
> >         distributions. There are two extra reasons to want a process based
> >         version: (1) Some Hadoop distributions run in machines with
> >         really old
> >         kernels where docker support is limited or nonexistent (yes, some of
> >         those run on kernel 2.6!) and (2) Ops people may be reticent to the
> >         additional operational overhead of enabling docker in their
> >         clusters.
> >         On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
> >         <mxm@apache.org <ma...@apache.org>> wrote:
> >          >
> >          > Thanks Henning and Thomas. It looks like
> >          >
> >          > a) we want to keep the Docker Job Server Docker container and
> >         rely on
> >          > spinning up "sibling" SDK harness containers via the Docker
> >         socket. This
> >          > should require little changes to the Runner code.
> >          >
> >          > b) have the InProcess SDK harness as an alternative way to
> >         running user
> >          > code. This can be done independently of a).
> >          >
> >          > Thomas, let's sync today on the InProcess SDK harness. I've
> >         created a
> >          > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >          >
> >          > Cheers,
> >          > Max
> >          >
> >          > On 21.08.18 00:35, Thomas Weise wrote:
> >          > > The original objective was to make test/development easier
> >         (which I
> >          > > think is super important for user experience with portable
> >         runner).
> >          > >
> >          > >  From first hand experience I can confirm that dealing with
> >         Flink
> >          > > clusters and Docker containers for local setup is a
> >         significant hurdle
> >          > > for Python developers.
> >          > >
> >          > > To simplify using Flink in embedded mode, the (direct)
> >         process based SDK
> >          > > harness would be a good option, especially when it can be
> >         linked to the
> >          > > same virtualenv that developers have already setup,
> >         eliminating extra
> >          > > packaging/deployment steps.
> >          > >
> >          > > Max, I would be interested to sync up on what your thoughts are
> >          > > regarding that option since you mention you also started to
> >         work on it
> >          > > (see previous discussion [1], not sure if there is a JIRA
> >         for it yet).
> >          > > Internally we are planning to use a direct SDK harness
> >         process instead
> >          > > of Docker containers. For our specific needs it will works
> >         equally well
> >          > > for development and production, including future plans to
> >         deploy Flink
> >          > > TMs via Kubernetes.
> >          > >
> >          > > Thanks,
> >          > > Thomas
> >          > >
> >          > > [1]
> >          > >
> >         https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >          > >
> >          > >
> >          > >
> >          > >
> >          > >
> >          > >
> >          > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
> >         <mxm@apache.org <ma...@apache.org>
> >          > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
> >          > >
> >          > >     Thanks for your suggestions. Please see below.
> >          > >
> >          > >      > Option 3) would be to map in the docker binary and
> >         socket to allow
> >          > >      > the containerized Flink job server to start
> >         "sibling" containers on
> >          > >      > the host.
> >          > >
> >          > >     Do you mean packaging Docker inside the Job Server
> >         container and
> >          > >     mounting /var/run/docker.sock from the host inside the
> >         container? That
> >          > >     looks like a bit of a hack but for testing it could be
> >         fine.
> >          > >
> >          > >      > notably, if the runner supports auto-scaling or
> >         similar non-trivial
> >          > >      > configurations, that would be difficult to manage
> >         from the SDK side.
> >          > >
> >          > >     You're right, it would be unfortunate if the SDK would
> >         have to deal with
> >          > >     spinning up SDK harness/backend containers. For non-trivial
> >          > >     configurations it would probably require an extended
> >         protocol.
> >          > >
> >          > >      > Option 4) We are also thinking about adding process
> >         based SDKHarness.
> >          > >      > This will avoid docker in docker scenario.
> >          > >
> >          > >     Actually, I had started implementing a process-based
> >         SDK harness but
> >          > >     figured it might be impractical because it doubles the
> >         execution path
> >          > >     for UDF code and potentially doesn't work with custom
> >         dependencies.
> >          > >
> >          > >      > Process based SDKHarness also has other applications
> >         and might be
> >          > >      > desirable in some of the production use cases.
> >          > >
> >          > >     True. Some users might want something more lightweight.
> >          > >
> >          >
> >          > --
> >          > Max
> >
>
> --
> Max

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

Big +1. Process-based execution should be simple to reason about for 
users. The implementation should not be too involved. The user has to 
ensure the environment is suitable for process-based execution.

There are some minor features that we should support:

- Activating a virtual environment for Python / Adding pre-installed 
libraries to the classpath

- Staging libraries, similarly to the boot code for Docker


On 22.08.18 07:49, Henning Rohde wrote:
> Agree with Luke. Perhaps something simple, prescriptive yet flexible, 
> such as custom command line (defined in the environment proto) rooted at 
> the base of the provided artifacts and either passed the same arguments 
> or defined in the container contract or made available through 
> substitution. That way, all the restrictions/assumptions of the 
> execution environment become implicit and runner/deployment dependent.
> 
> 
> On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lcwik@google.com 
> <ma...@google.com>> wrote:
> 
>     I believe supporting a simple Process environment makes sense. It
>     would be best if we didn't make the Process route solve all the
>     problems that Docker solves for us. In my opinion we should limit
>     the Process route to assume that the execution environment:
>     * has all dependencies and libraries installed
>     * is of a compatible machine architecture
>     * doesn't require special networking rules to be setup
> 
>     Any other suggestions for reasonable limits on a Process environment?
> 
>     On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <iemejia@gmail.com
>     <ma...@gmail.com>> wrote:
> 
>         It is also worth to mention that apart of the
>         testing/development use
>         case there is also the case of supporting people running in Hadoop
>         distributions. There are two extra reasons to want a process based
>         version: (1) Some Hadoop distributions run in machines with
>         really old
>         kernels where docker support is limited or nonexistent (yes, some of
>         those run on kernel 2.6!) and (2) Ops people may be reticent to the
>         additional operational overhead of enabling docker in their
>         clusters.
>         On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels
>         <mxm@apache.org <ma...@apache.org>> wrote:
>          >
>          > Thanks Henning and Thomas. It looks like
>          >
>          > a) we want to keep the Docker Job Server Docker container and
>         rely on
>          > spinning up "sibling" SDK harness containers via the Docker
>         socket. This
>          > should require little changes to the Runner code.
>          >
>          > b) have the InProcess SDK harness as an alternative way to
>         running user
>          > code. This can be done independently of a).
>          >
>          > Thomas, let's sync today on the InProcess SDK harness. I've
>         created a
>          > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>          >
>          > Cheers,
>          > Max
>          >
>          > On 21.08.18 00:35, Thomas Weise wrote:
>          > > The original objective was to make test/development easier
>         (which I
>          > > think is super important for user experience with portable
>         runner).
>          > >
>          > >  From first hand experience I can confirm that dealing with
>         Flink
>          > > clusters and Docker containers for local setup is a
>         significant hurdle
>          > > for Python developers.
>          > >
>          > > To simplify using Flink in embedded mode, the (direct)
>         process based SDK
>          > > harness would be a good option, especially when it can be
>         linked to the
>          > > same virtualenv that developers have already setup,
>         eliminating extra
>          > > packaging/deployment steps.
>          > >
>          > > Max, I would be interested to sync up on what your thoughts are
>          > > regarding that option since you mention you also started to
>         work on it
>          > > (see previous discussion [1], not sure if there is a JIRA
>         for it yet).
>          > > Internally we are planning to use a direct SDK harness
>         process instead
>          > > of Docker containers. For our specific needs it will works
>         equally well
>          > > for development and production, including future plans to
>         deploy Flink
>          > > TMs via Kubernetes.
>          > >
>          > > Thanks,
>          > > Thomas
>          > >
>          > > [1]
>          > >
>         https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
>          > >
>          > >
>          > >
>          > >
>          > >
>          > >
>          > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels
>         <mxm@apache.org <ma...@apache.org>
>          > > <mailto:mxm@apache.org <ma...@apache.org>>> wrote:
>          > >
>          > >     Thanks for your suggestions. Please see below.
>          > >
>          > >      > Option 3) would be to map in the docker binary and
>         socket to allow
>          > >      > the containerized Flink job server to start
>         "sibling" containers on
>          > >      > the host.
>          > >
>          > >     Do you mean packaging Docker inside the Job Server
>         container and
>          > >     mounting /var/run/docker.sock from the host inside the
>         container? That
>          > >     looks like a bit of a hack but for testing it could be
>         fine.
>          > >
>          > >      > notably, if the runner supports auto-scaling or
>         similar non-trivial
>          > >      > configurations, that would be difficult to manage
>         from the SDK side.
>          > >
>          > >     You're right, it would be unfortunate if the SDK would
>         have to deal with
>          > >     spinning up SDK harness/backend containers. For non-trivial
>          > >     configurations it would probably require an extended
>         protocol.
>          > >
>          > >      > Option 4) We are also thinking about adding process
>         based SDKHarness.
>          > >      > This will avoid docker in docker scenario.
>          > >
>          > >     Actually, I had started implementing a process-based
>         SDK harness but
>          > >     figured it might be impractical because it doubles the
>         execution path
>          > >     for UDF code and potentially doesn't work with custom
>         dependencies.
>          > >
>          > >      > Process based SDKHarness also has other applications
>         and might be
>          > >      > desirable in some of the production use cases.
>          > >
>          > >     True. Some users might want something more lightweight.
>          > >
>          >
>          > --
>          > Max
> 

-- 
Max

Re: Bootstrapping Beam's Job Server

Posted by Henning Rohde <he...@google.com>.

Agree with Luke. Perhaps something simple, prescriptive yet flexible, such
as custom command line (defined in the environment proto) rooted at the
base of the provided artifacts and either passed the same arguments or
defined in the container contract or made available through substitution.
That way, all the restrictions/assumptions of the execution environment
become implicit and runner/deployment dependent.


On Tue, Aug 21, 2018 at 2:12 PM Lukasz Cwik <lc...@google.com> wrote:

> I believe supporting a simple Process environment makes sense. It would be
> best if we didn't make the Process route solve all the problems that Docker
> solves for us. In my opinion we should limit the Process route to assume
> that the execution environment:
> * has all dependencies and libraries installed
> * is of a compatible machine architecture
> * doesn't require special networking rules to be setup
>
> Any other suggestions for reasonable limits on a Process environment?
>
> On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> It is also worth to mention that apart of the testing/development use
>> case there is also the case of supporting people running in Hadoop
>> distributions. There are two extra reasons to want a process based
>> version: (1) Some Hadoop distributions run in machines with really old
>> kernels where docker support is limited or nonexistent (yes, some of
>> those run on kernel 2.6!) and (2) Ops people may be reticent to the
>> additional operational overhead of enabling docker in their clusters.
>> On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels <mx...@apache.org>
>> wrote:
>> >
>> > Thanks Henning and Thomas. It looks like
>> >
>> > a) we want to keep the Docker Job Server Docker container and rely on
>> > spinning up "sibling" SDK harness containers via the Docker socket. This
>> > should require little changes to the Runner code.
>> >
>> > b) have the InProcess SDK harness as an alternative way to running user
>> > code. This can be done independently of a).
>> >
>> > Thomas, let's sync today on the InProcess SDK harness. I've created a
>> > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>> >
>> > Cheers,
>> > Max
>> >
>> > On 21.08.18 00:35, Thomas Weise wrote:
>> > > The original objective was to make test/development easier (which I
>> > > think is super important for user experience with portable runner).
>> > >
>> > >  From first hand experience I can confirm that dealing with Flink
>> > > clusters and Docker containers for local setup is a significant hurdle
>> > > for Python developers.
>> > >
>> > > To simplify using Flink in embedded mode, the (direct) process based
>> SDK
>> > > harness would be a good option, especially when it can be linked to
>> the
>> > > same virtualenv that developers have already setup, eliminating extra
>> > > packaging/deployment steps.
>> > >
>> > > Max, I would be interested to sync up on what your thoughts are
>> > > regarding that option since you mention you also started to work on it
>> > > (see previous discussion [1], not sure if there is a JIRA for it yet).
>> > > Internally we are planning to use a direct SDK harness process instead
>> > > of Docker containers. For our specific needs it will works equally
>> well
>> > > for development and production, including future plans to deploy Flink
>> > > TMs via Kubernetes.
>> > >
>> > > Thanks,
>> > > Thomas
>> > >
>> > > [1]
>> > >
>> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mxm@apache.org
>> > > <ma...@apache.org>> wrote:
>> > >
>> > >     Thanks for your suggestions. Please see below.
>> > >
>> > >      > Option 3) would be to map in the docker binary and socket to
>> allow
>> > >      > the containerized Flink job server to start "sibling"
>> containers on
>> > >      > the host.
>> > >
>> > >     Do you mean packaging Docker inside the Job Server container and
>> > >     mounting /var/run/docker.sock from the host inside the container?
>> That
>> > >     looks like a bit of a hack but for testing it could be fine.
>> > >
>> > >      > notably, if the runner supports auto-scaling or similar
>> non-trivial
>> > >      > configurations, that would be difficult to manage from the SDK
>> side.
>> > >
>> > >     You're right, it would be unfortunate if the SDK would have to
>> deal with
>> > >     spinning up SDK harness/backend containers. For non-trivial
>> > >     configurations it would probably require an extended protocol.
>> > >
>> > >      > Option 4) We are also thinking about adding process based
>> SDKHarness.
>> > >      > This will avoid docker in docker scenario.
>> > >
>> > >     Actually, I had started implementing a process-based SDK harness
>> but
>> > >     figured it might be impractical because it doubles the execution
>> path
>> > >     for UDF code and potentially doesn't work with custom
>> dependencies.
>> > >
>> > >      > Process based SDKHarness also has other applications and might
>> be
>> > >      > desirable in some of the production use cases.
>> > >
>> > >     True. Some users might want something more lightweight.
>> > >
>> >
>> > --
>> > Max
>>
>

Re: Bootstrapping Beam's Job Server

Posted by Lukasz Cwik <lc...@google.com>.

I believe supporting a simple Process environment makes sense. It would be
best if we didn't make the Process route solve all the problems that Docker
solves for us. In my opinion we should limit the Process route to assume
that the execution environment:
* has all dependencies and libraries installed
* is of a compatible machine architecture
* doesn't require special networking rules to be setup

Any other suggestions for reasonable limits on a Process environment?

On Tue, Aug 21, 2018 at 2:53 AM Ismaël Mejía <ie...@gmail.com> wrote:

> It is also worth to mention that apart of the testing/development use
> case there is also the case of supporting people running in Hadoop
> distributions. There are two extra reasons to want a process based
> version: (1) Some Hadoop distributions run in machines with really old
> kernels where docker support is limited or nonexistent (yes, some of
> those run on kernel 2.6!) and (2) Ops people may be reticent to the
> additional operational overhead of enabling docker in their clusters.
> On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels <mx...@apache.org>
> wrote:
> >
> > Thanks Henning and Thomas. It looks like
> >
> > a) we want to keep the Docker Job Server Docker container and rely on
> > spinning up "sibling" SDK harness containers via the Docker socket. This
> > should require little changes to the Runner code.
> >
> > b) have the InProcess SDK harness as an alternative way to running user
> > code. This can be done independently of a).
> >
> > Thomas, let's sync today on the InProcess SDK harness. I've created a
> > JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
> >
> > Cheers,
> > Max
> >
> > On 21.08.18 00:35, Thomas Weise wrote:
> > > The original objective was to make test/development easier (which I
> > > think is super important for user experience with portable runner).
> > >
> > >  From first hand experience I can confirm that dealing with Flink
> > > clusters and Docker containers for local setup is a significant hurdle
> > > for Python developers.
> > >
> > > To simplify using Flink in embedded mode, the (direct) process based
> SDK
> > > harness would be a good option, especially when it can be linked to the
> > > same virtualenv that developers have already setup, eliminating extra
> > > packaging/deployment steps.
> > >
> > > Max, I would be interested to sync up on what your thoughts are
> > > regarding that option since you mention you also started to work on it
> > > (see previous discussion [1], not sure if there is a JIRA for it yet).
> > > Internally we are planning to use a direct SDK harness process instead
> > > of Docker containers. For our specific needs it will works equally well
> > > for development and production, including future plans to deploy Flink
> > > TMs via Kubernetes.
> > >
> > > Thanks,
> > > Thomas
> > >
> > > [1]
> > >
> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> > >
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mxm@apache.org
> > > <ma...@apache.org>> wrote:
> > >
> > >     Thanks for your suggestions. Please see below.
> > >
> > >      > Option 3) would be to map in the docker binary and socket to
> allow
> > >      > the containerized Flink job server to start "sibling"
> containers on
> > >      > the host.
> > >
> > >     Do you mean packaging Docker inside the Job Server container and
> > >     mounting /var/run/docker.sock from the host inside the container?
> That
> > >     looks like a bit of a hack but for testing it could be fine.
> > >
> > >      > notably, if the runner supports auto-scaling or similar
> non-trivial
> > >      > configurations, that would be difficult to manage from the SDK
> side.
> > >
> > >     You're right, it would be unfortunate if the SDK would have to
> deal with
> > >     spinning up SDK harness/backend containers. For non-trivial
> > >     configurations it would probably require an extended protocol.
> > >
> > >      > Option 4) We are also thinking about adding process based
> SDKHarness.
> > >      > This will avoid docker in docker scenario.
> > >
> > >     Actually, I had started implementing a process-based SDK harness
> but
> > >     figured it might be impractical because it doubles the execution
> path
> > >     for UDF code and potentially doesn't work with custom dependencies.
> > >
> > >      > Process based SDKHarness also has other applications and might
> be
> > >      > desirable in some of the production use cases.
> > >
> > >     True. Some users might want something more lightweight.
> > >
> >
> > --
> > Max
>

Re: Bootstrapping Beam's Job Server

Posted by Ismaël Mejía <ie...@gmail.com>.

It is also worth to mention that apart of the testing/development use
case there is also the case of supporting people running in Hadoop
distributions. There are two extra reasons to want a process based
version: (1) Some Hadoop distributions run in machines with really old
kernels where docker support is limited or nonexistent (yes, some of
those run on kernel 2.6!) and (2) Ops people may be reticent to the
additional operational overhead of enabling docker in their clusters.
On Tue, Aug 21, 2018 at 11:50 AM Maximilian Michels <mx...@apache.org> wrote:
>
> Thanks Henning and Thomas. It looks like
>
> a) we want to keep the Docker Job Server Docker container and rely on
> spinning up "sibling" SDK harness containers via the Docker socket. This
> should require little changes to the Runner code.
>
> b) have the InProcess SDK harness as an alternative way to running user
> code. This can be done independently of a).
>
> Thomas, let's sync today on the InProcess SDK harness. I've created a
> JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187
>
> Cheers,
> Max
>
> On 21.08.18 00:35, Thomas Weise wrote:
> > The original objective was to make test/development easier (which I
> > think is super important for user experience with portable runner).
> >
> >  From first hand experience I can confirm that dealing with Flink
> > clusters and Docker containers for local setup is a significant hurdle
> > for Python developers.
> >
> > To simplify using Flink in embedded mode, the (direct) process based SDK
> > harness would be a good option, especially when it can be linked to the
> > same virtualenv that developers have already setup, eliminating extra
> > packaging/deployment steps.
> >
> > Max, I would be interested to sync up on what your thoughts are
> > regarding that option since you mention you also started to work on it
> > (see previous discussion [1], not sure if there is a JIRA for it yet).
> > Internally we are planning to use a direct SDK harness process instead
> > of Docker containers. For our specific needs it will works equally well
> > for development and production, including future plans to deploy Flink
> > TMs via Kubernetes.
> >
> > Thanks,
> > Thomas
> >
> > [1]
> > https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> >
> >
> >
> >
> >
> >
> > On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mxm@apache.org
> > <ma...@apache.org>> wrote:
> >
> >     Thanks for your suggestions. Please see below.
> >
> >      > Option 3) would be to map in the docker binary and socket to allow
> >      > the containerized Flink job server to start "sibling" containers on
> >      > the host.
> >
> >     Do you mean packaging Docker inside the Job Server container and
> >     mounting /var/run/docker.sock from the host inside the container? That
> >     looks like a bit of a hack but for testing it could be fine.
> >
> >      > notably, if the runner supports auto-scaling or similar non-trivial
> >      > configurations, that would be difficult to manage from the SDK side.
> >
> >     You're right, it would be unfortunate if the SDK would have to deal with
> >     spinning up SDK harness/backend containers. For non-trivial
> >     configurations it would probably require an extended protocol.
> >
> >      > Option 4) We are also thinking about adding process based SDKHarness.
> >      > This will avoid docker in docker scenario.
> >
> >     Actually, I had started implementing a process-based SDK harness but
> >     figured it might be impractical because it doubles the execution path
> >     for UDF code and potentially doesn't work with custom dependencies.
> >
> >      > Process based SDKHarness also has other applications and might be
> >      > desirable in some of the production use cases.
> >
> >     True. Some users might want something more lightweight.
> >
>
> --
> Max

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

Thanks Henning and Thomas. It looks like

a) we want to keep the Docker Job Server Docker container and rely on
spinning up "sibling" SDK harness containers via the Docker socket. This 
should require little changes to the Runner code.

b) have the InProcess SDK harness as an alternative way to running user
code. This can be done independently of a).

Thomas, let's sync today on the InProcess SDK harness. I've created a
JIRA issue: https://issues.apache.org/jira/browse/BEAM-5187

Cheers,
Max

On 21.08.18 00:35, Thomas Weise wrote:
> The original objective was to make test/development easier (which I 
> think is super important for user experience with portable runner).
> 
>  From first hand experience I can confirm that dealing with Flink 
> clusters and Docker containers for local setup is a significant hurdle 
> for Python developers.
> 
> To simplify using Flink in embedded mode, the (direct) process based SDK 
> harness would be a good option, especially when it can be linked to the 
> same virtualenv that developers have already setup, eliminating extra 
> packaging/deployment steps.
> 
> Max, I would be interested to sync up on what your thoughts are 
> regarding that option since you mention you also started to work on it 
> (see previous discussion [1], not sure if there is a JIRA for it yet). 
> Internally we are planning to use a direct SDK harness process instead 
> of Docker containers. For our specific needs it will works equally well 
> for development and production, including future plans to deploy Flink 
> TMs via Kubernetes.
> 
> Thanks,
> Thomas
> 
> [1] 
> https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E
> 
> 
> 
> 
> 
> 
> On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mxm@apache.org 
> <ma...@apache.org>> wrote:
> 
>     Thanks for your suggestions. Please see below.
> 
>      > Option 3) would be to map in the docker binary and socket to allow
>      > the containerized Flink job server to start "sibling" containers on
>      > the host.
> 
>     Do you mean packaging Docker inside the Job Server container and
>     mounting /var/run/docker.sock from the host inside the container? That
>     looks like a bit of a hack but for testing it could be fine.
> 
>      > notably, if the runner supports auto-scaling or similar non-trivial
>      > configurations, that would be difficult to manage from the SDK side.
> 
>     You're right, it would be unfortunate if the SDK would have to deal with
>     spinning up SDK harness/backend containers. For non-trivial
>     configurations it would probably require an extended protocol.
> 
>      > Option 4) We are also thinking about adding process based SDKHarness.
>      > This will avoid docker in docker scenario.
> 
>     Actually, I had started implementing a process-based SDK harness but
>     figured it might be impractical because it doubles the execution path
>     for UDF code and potentially doesn't work with custom dependencies.
> 
>      > Process based SDKHarness also has other applications and might be
>      > desirable in some of the production use cases.
> 
>     True. Some users might want something more lightweight.
> 

-- 
Max

Re: Bootstrapping Beam's Job Server

Posted by Thomas Weise <th...@apache.org>.

The original objective was to make test/development easier (which I think
is super important for user experience with portable runner).

From first hand experience I can confirm that dealing with Flink clusters
and Docker containers for local setup is a significant hurdle for Python
developers.

To simplify using Flink in embedded mode, the (direct) process based SDK
harness would be a good option, especially when it can be linked to the
same virtualenv that developers have already setup, eliminating extra
packaging/deployment steps.

Max, I would be interested to sync up on what your thoughts are regarding
that option since you mention you also started to work on it (see previous
discussion [1], not sure if there is a JIRA for it yet). Internally we are
planning to use a direct SDK harness process instead of Docker containers.
For our specific needs it will works equally well for development and
production, including future plans to deploy Flink TMs via Kubernetes.

Thanks,
Thomas

[1]
https://lists.apache.org/thread.html/d8b81e9f74f77d74c8b883cda80fa48efdcaf6ac2ad313c4fe68795a@%3Cdev.beam.apache.org%3E

On Mon, Aug 20, 2018 at 3:00 PM Maximilian Michels <mx...@apache.org> wrote:

> Thanks for your suggestions. Please see below.
>
> > Option 3) would be to map in the docker binary and socket to allow
> > the containerized Flink job server to start "sibling" containers on
> > the host.
>
> Do you mean packaging Docker inside the Job Server container and
> mounting /var/run/docker.sock from the host inside the container? That
> looks like a bit of a hack but for testing it could be fine.
>
> > notably, if the runner supports auto-scaling or similar non-trivial
> > configurations, that would be difficult to manage from the SDK side.
>
> You're right, it would be unfortunate if the SDK would have to deal with
> spinning up SDK harness/backend containers. For non-trivial
> configurations it would probably require an extended protocol.
>
> > Option 4) We are also thinking about adding process based SDKHarness.
> > This will avoid docker in docker scenario.
>
> Actually, I had started implementing a process-based SDK harness but
> figured it might be impractical because it doubles the execution path
> for UDF code and potentially doesn't work with custom dependencies.
>
> > Process based SDKHarness also has other applications and might be
> > desirable in some of the production use cases.
>
> True. Some users might want something more lightweight.
>

Re: Bootstrapping Beam's Job Server

Posted by Maximilian Michels <mx...@apache.org>.

Thanks for your suggestions. Please see below.

> Option 3) would be to map in the docker binary and socket to allow
> the containerized Flink job server to start "sibling" containers on
> the host.

Do you mean packaging Docker inside the Job Server container and
mounting /var/run/docker.sock from the host inside the container? That
looks like a bit of a hack but for testing it could be fine.

> notably, if the runner supports auto-scaling or similar non-trivial
> configurations, that would be difficult to manage from the SDK side.

You're right, it would be unfortunate if the SDK would have to deal with
spinning up SDK harness/backend containers. For non-trivial
configurations it would probably require an extended protocol.

> Option 4) We are also thinking about adding process based SDKHarness.
> This will avoid docker in docker scenario.

Actually, I had started implementing a process-based SDK harness but
figured it might be impractical because it doubles the execution path
for UDF code and potentially doesn't work with custom dependencies.

> Process based SDKHarness also has other applications and might be
> desirable in some of the production use cases.

True. Some users might want something more lightweight.

Re: Bootstrapping Beam's Job Server

Posted by Ankur Goenka <go...@google.com>.

Option 4) We are also thinking about adding process based SDKHarness. This
will avoid docker in docker scenario.
Process based SDKHarness also has other applications and might be desirable
in some of the production use cases.

On Mon, Aug 20, 2018 at 11:49 AM Henning Rohde <he...@google.com> wrote:

> Option 3) would be to map in the docker binary and socket to allow the
> containerized Flink job server to start "sibling" containers on the host.
> That both avoids docker-in-docker (which is indeed undesirable) as well as
> extra requirements for each SDK to spin up containers -- notably, if the
> runner supports auto-scaling or similar non-trivial configurations, that
> would be difficult to manage from the SDK side.
>
> Henning
>
> On Mon, Aug 20, 2018 at 8:31 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Hi everyone,
>>
>> I wanted to get your opinion on the Job-Server startup [1] which is part
>> of the portability story.
>>
>> I've created a docker container to bring up Beam's Job Server, which is
>> the entry point for pipeline execution. Generally, this works fine when
>> the backend (Flink in this case) runs externally and the Job Server
>> connects to it.
>>
>> For tests or pipeline development we may want the backend to run
>> embedded (inside the Job Server) which is rather problematic because the
>> portability requires to spin up the SDK harness in a Docker container as
>> well. This would happen at runtime inside the Docker container.
>>
>> Since Docker inside Docker is not desirable I'm thinking about other
>> options:
>>
>> Option 1) Instead of a Docker container, we start a bundled Job-Server
>> binary (or jar) when we run the pipeline. The bundle also contains an
>> embedded variant of the backend. For Flink, this is basically the output
>> of `:beam-runners-flink_2.11-job-server:shadowJar` but it is started
>> during pipeline execution.
>>
>> Option 2) In addition to the Job Server, we let the SDK spin up another
>> Docker container with the backend. This is may be most applicable to all
>> types of backends since not all backends offer an embedded execution mode.
>>
>>
>> Keep in mind that this is only a problem for local/test execution but it
>> is an important aspect of Beam's usability.
>>
>> What do you think? I'm leaning towards option 2. Maybe you have other
>> options in mind.
>>
>> Cheers,
>> Max
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-4130
>>
>

Re: Bootstrapping Beam's Job Server

Posted by Henning Rohde <he...@google.com>.

Option 3) would be to map in the docker binary and socket to allow the
containerized Flink job server to start "sibling" containers on the host.
That both avoids docker-in-docker (which is indeed undesirable) as well as
extra requirements for each SDK to spin up containers -- notably, if the
runner supports auto-scaling or similar non-trivial configurations, that
would be difficult to manage from the SDK side.

Henning

On Mon, Aug 20, 2018 at 8:31 AM Maximilian Michels <mx...@apache.org> wrote:

> Hi everyone,
>
> I wanted to get your opinion on the Job-Server startup [1] which is part
> of the portability story.
>
> I've created a docker container to bring up Beam's Job Server, which is
> the entry point for pipeline execution. Generally, this works fine when
> the backend (Flink in this case) runs externally and the Job Server
> connects to it.
>
> For tests or pipeline development we may want the backend to run
> embedded (inside the Job Server) which is rather problematic because the
> portability requires to spin up the SDK harness in a Docker container as
> well. This would happen at runtime inside the Docker container.
>
> Since Docker inside Docker is not desirable I'm thinking about other
> options:
>
> Option 1) Instead of a Docker container, we start a bundled Job-Server
> binary (or jar) when we run the pipeline. The bundle also contains an
> embedded variant of the backend. For Flink, this is basically the output
> of `:beam-runners-flink_2.11-job-server:shadowJar` but it is started
> during pipeline execution.
>
> Option 2) In addition to the Job Server, we let the SDK spin up another
> Docker container with the backend. This is may be most applicable to all
> types of backends since not all backends offer an embedded execution mode.
>
>
> Keep in mind that this is only a problem for local/test execution but it
> is an important aspect of Beam's usability.
>
> What do you think? I'm leaning towards option 2. Maybe you have other
> options in mind.
>
> Cheers,
> Max
>
> [1] https://issues.apache.org/jira/browse/BEAM-4130
>