You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Kyle Weaver <kc...@google.com> on 2020/04/29 18:31:44 UTC

Rethinking Python's PortableRunner default job server

Hi all,

Currently, when running a pipeline that has the options
runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
Dockerized Flink job server [1]. This is problematic because the
PortableRunner can be used by any portable runner. So for example, a Spark
runner user was recently baffled when their job ran successfully but
printed a bunch of Flink log messages.

There are not too many uses of this default behavior to my knowledge, at
least within Beam itself. The only example I could find was in the
portableWordCount tests, which is mostly the same as
portableWordCountFlinkRunner tests [2]. The default behavior is entirely
superseded by the FlinkRunner class, which provides better encapsulation.

I also noticed that DockerizedJobServer is only used by [3]. In
FlinkRunner, we pull the job server from Maven if necessary and call Java
directly. In general, I think there are already quite enough knobs in the
portability framework, so we should remove it unless there is reason to
prefer running the job server with Docker instead of calling Java directly.

There are a couple options:

A) Remove the default behavior and require job_endpoint to always be set
when using PortableRunner. This would be a breaking change.
B) Keep the current behavior, but warn when the user sets
runner=PortableRunner without job_endpoint. This is easy to miss, but it's
better than nothing.

What do you think?

[1]
https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
[2]
https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
[3]
https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163

Re: Rethinking Python's PortableRunner default job server

Posted by Ismaël Mejía <ie...@gmail.com>.
Exact and it is not the same because there is an extra layer, because
the PortableRunner does not deal with the same issues that the other
runners e.g. translation and execution in the target system, it feels
more proxy than the 'translating runners' in the open source case.

On Thu, Apr 30, 2020 at 9:53 PM Kyle Weaver <kc...@google.com> wrote:
>
> > all runners (with perhaps the exception of the direct runner) are proxies for actual runners
>
> Agreed. The main difference is that this fact is more obvious for Dataflow users, since it is "Cloud" Dataflow after all. The relationship of Beam to its OSS runners is much less clear to new users (for example, folks are often confused about the difference between Beam's Flink job server images and Flink's own Docker images).
>
> > though we could argue that the direct runner would be a reasonable default
>
> Why set runner=PortableRuner then, when direct runner is the default? Besides, the direct runner has its own murky status with regard to portability, and its own defaults and branching paths, so I'd rather leave that out of the equation.
>
> On Thu, Apr 30, 2020 at 3:23 PM Robert Bradshaw <ro...@google.com> wrote:
>>
>> In a sense, all runners (with perhaps the exception of the direct runner) are proxies for actual runners. In that sense, I think it makes just as much sense to say "I want the portable runner with job endpoint X" as to say "I want the flink runner with master Y." Saying "I want the Portable Runner" without specifying an endpoint should, however, be undefined (though we could argue that the direct runner would be a reasonable default).
>>
>> On Thu, Apr 30, 2020 at 11:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>
>>> Thomas has a point on the PortableRunner name, I was super confused
>>> because of the `PortableRunner` not being a runner, I don't know if
>>> too late but maybe it is still worth to give it a better name.
>>>
>>> On Thu, Apr 30, 2020 at 8:41 PM Thomas Weise <th...@apache.org> wrote:
>>> >
>>> > +1 for removing the default runner. It has always been the Beam user expectation that a runner needs to be selected.
>>> >
>>> > "PortableRunner" isn't a runner (despite its name) - it's a proxy to a runner that the user specifies via job_endpoint.
>>> >
>>> > Thanks for cleaning this up!
>>> >
>>> > On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <kc...@google.com> wrote:
>>> >>
>>> >> I'll bite :) Thanks for the feedback everyone!
>>> >>
>>> >> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>> >>>
>>> >>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>>> >>>
>>> >>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>> >>>>
>>> >>>> +1 for A there are zero reasons to have a default runner set by
>>> >>>> default, being explicit is better as Robert suggests and it resolves
>>> >>>> the confusion that the user reported.
>>> >>>>
>>> >>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com> wrote:
>>> >>>> >
>>> >>>> > +1, I was actually thinking about this just the other day. PortableRunner should require job_endpoint to be set, and we can have a nice error message directing the explicit use of FlinkRunner for the old behavior.
>>> >>>> >
>>> >>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com> wrote:
>>> >>>> >>
>>> >>>> >> > Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>> >>>> >>
>>> >>>> >> Definitely.
>>> >>>> >>
>>> >>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com> wrote:
>>> >>>> >>>
>>> >>>> >>> Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>> >>>> >>>
>>> >>>> >>> Brian
>>> >>>> >>>
>>> >>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:
>>> >>>> >>>>
>>> >>>> >>>> Hi all,
>>> >>>> >>>>
>>> >>>> >>>> Currently, when running a pipeline that has the options runner=PortableRunner and job_endpoint unset, the Python SDK spins up a Dockerized Flink job server [1]. This is problematic because the PortableRunner can be used by any portable runner. So for example, a Spark runner user was recently baffled when their job ran successfully but printed a bunch of Flink log messages.
>>> >>>> >>>>
>>> >>>> >>>> There are not too many uses of this default behavior to my knowledge, at least within Beam itself. The only example I could find was in the portableWordCount tests, which is mostly the same as portableWordCountFlinkRunner tests [2]. The default behavior is entirely superseded by the FlinkRunner class, which provides better encapsulation.
>>> >>>> >>>>
>>> >>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In FlinkRunner, we pull the job server from Maven if necessary and call Java directly. In general, I think there are already quite enough knobs in the portability framework, so we should remove it unless there is reason to prefer running the job server with Docker instead of calling Java directly.
>>> >>>> >>>>
>>> >>>> >>>> There are a couple options:
>>> >>>> >>>>
>>> >>>> >>>> A) Remove the default behavior and require job_endpoint to always be set when using PortableRunner. This would be a breaking change.
>>> >>>> >>>> B) Keep the current behavior, but warn when the user sets runner=PortableRunner without job_endpoint. This is easy to miss, but it's better than nothing.
>>> >>>> >>>>
>>> >>>> >>>> What do you think?
>>> >>>> >>>>
>>> >>>> >>>> [1] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>> >>>> >>>> [2] https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>> >>>> >>>> [3] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163

Re: Rethinking Python's PortableRunner default job server

Posted by Kyle Weaver <kc...@google.com>.
> all runners (with perhaps the exception of the direct runner) are proxies
for actual runners

Agreed. The main difference is that this fact is more obvious for Dataflow
users, since it is "Cloud" Dataflow after all. The relationship of Beam to
its OSS runners is much less clear to new users (for example, folks are
often confused about the difference between Beam's Flink job server images
and Flink's own Docker images).

> though we could argue that the direct runner would be a reasonable default

Why set runner=PortableRuner then, when direct runner is the default?
Besides, the direct runner has its own murky status with regard to
portability, and its own defaults and branching paths, so I'd rather leave
that out of the equation.

On Thu, Apr 30, 2020 at 3:23 PM Robert Bradshaw <ro...@google.com> wrote:

> In a sense, all runners (with perhaps the exception of the direct runner)
> are proxies for actual runners. In that sense, I think it makes just as
> much sense to say "I want the portable runner with job endpoint X" as to
> say "I want the flink runner with master Y." Saying "I want the Portable
> Runner" without specifying an endpoint should, however, be undefined
> (though we could argue that the direct runner would be a reasonable
> default).
>
> On Thu, Apr 30, 2020 at 11:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> Thomas has a point on the PortableRunner name, I was super confused
>> because of the `PortableRunner` not being a runner, I don't know if
>> too late but maybe it is still worth to give it a better name.
>>
>> On Thu, Apr 30, 2020 at 8:41 PM Thomas Weise <th...@apache.org> wrote:
>> >
>> > +1 for removing the default runner. It has always been the Beam user
>> expectation that a runner needs to be selected.
>> >
>> > "PortableRunner" isn't a runner (despite its name) - it's a proxy to a
>> runner that the user specifies via job_endpoint.
>> >
>> > Thanks for cleaning this up!
>> >
>> > On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <kc...@google.com>
>> wrote:
>> >>
>> >> I'll bite :) Thanks for the feedback everyone!
>> >>
>> >> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >>>
>> >>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>> >>>
>> >>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com>
>> wrote:
>> >>>>
>> >>>> +1 for A there are zero reasons to have a default runner set by
>> >>>> default, being explicit is better as Robert suggests and it resolves
>> >>>> the confusion that the user reported.
>> >>>>
>> >>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <
>> robertwb@google.com> wrote:
>> >>>> >
>> >>>> > +1, I was actually thinking about this just the other day.
>> PortableRunner should require job_endpoint to be set, and we can have a
>> nice error message directing the explicit use of FlinkRunner for the old
>> behavior.
>> >>>> >
>> >>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com>
>> wrote:
>> >>>> >>
>> >>>> >> > Could the error message suggest switching to FlinkRunner
>> (and/or other runners that start a job server for you)? Then it seems like
>> the breakage would only be a minor annoyance.
>> >>>> >>
>> >>>> >> Definitely.
>> >>>> >>
>> >>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <
>> bhulette@google.com> wrote:
>> >>>> >>>
>> >>>> >>> Could the error message suggest switching to FlinkRunner (and/or
>> other runners that start a job server for you)? Then it seems like the
>> breakage would only be a minor annoyance.
>> >>>> >>>
>> >>>> >>> Brian
>> >>>> >>>
>> >>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <
>> kcweaver@google.com> wrote:
>> >>>> >>>>
>> >>>> >>>> Hi all,
>> >>>> >>>>
>> >>>> >>>> Currently, when running a pipeline that has the options
>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>> Dockerized Flink job server [1]. This is problematic because the
>> PortableRunner can be used by any portable runner. So for example, a Spark
>> runner user was recently baffled when their job ran successfully but
>> printed a bunch of Flink log messages.
>> >>>> >>>>
>> >>>> >>>> There are not too many uses of this default behavior to my
>> knowledge, at least within Beam itself. The only example I could find was
>> in the portableWordCount tests, which is mostly the same as
>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>> superseded by the FlinkRunner class, which provides better encapsulation.
>> >>>> >>>>
>> >>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In
>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>> directly. In general, I think there are already quite enough knobs in the
>> portability framework, so we should remove it unless there is reason to
>> prefer running the job server with Docker instead of calling Java directly.
>> >>>> >>>>
>> >>>> >>>> There are a couple options:
>> >>>> >>>>
>> >>>> >>>> A) Remove the default behavior and require job_endpoint to
>> always be set when using PortableRunner. This would be a breaking change.
>> >>>> >>>> B) Keep the current behavior, but warn when the user sets
>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>> better than nothing.
>> >>>> >>>>
>> >>>> >>>> What do you think?
>> >>>> >>>>
>> >>>> >>>> [1]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>> >>>> >>>> [2]
>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>> >>>> >>>> [3]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>
>

Re: Rethinking Python's PortableRunner default job server

Posted by Robert Bradshaw <ro...@google.com>.
In a sense, all runners (with perhaps the exception of the direct runner)
are proxies for actual runners. In that sense, I think it makes just as
much sense to say "I want the portable runner with job endpoint X" as to
say "I want the flink runner with master Y." Saying "I want the Portable
Runner" without specifying an endpoint should, however, be undefined
(though we could argue that the direct runner would be a reasonable
default).

On Thu, Apr 30, 2020 at 11:49 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Thomas has a point on the PortableRunner name, I was super confused
> because of the `PortableRunner` not being a runner, I don't know if
> too late but maybe it is still worth to give it a better name.
>
> On Thu, Apr 30, 2020 at 8:41 PM Thomas Weise <th...@apache.org> wrote:
> >
> > +1 for removing the default runner. It has always been the Beam user
> expectation that a runner needs to be selected.
> >
> > "PortableRunner" isn't a runner (despite its name) - it's a proxy to a
> runner that the user specifies via job_endpoint.
> >
> > Thanks for cleaning this up!
> >
> > On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <kc...@google.com>
> wrote:
> >>
> >> I'll bite :) Thanks for the feedback everyone!
> >>
> >> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>
> >>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
> >>>
> >>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> >>>>
> >>>> +1 for A there are zero reasons to have a default runner set by
> >>>> default, being explicit is better as Robert suggests and it resolves
> >>>> the confusion that the user reported.
> >>>>
> >>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >>>> >
> >>>> > +1, I was actually thinking about this just the other day.
> PortableRunner should require job_endpoint to be set, and we can have a
> nice error message directing the explicit use of FlinkRunner for the old
> behavior.
> >>>> >
> >>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com>
> wrote:
> >>>> >>
> >>>> >> > Could the error message suggest switching to FlinkRunner (and/or
> other runners that start a job server for you)? Then it seems like the
> breakage would only be a minor annoyance.
> >>>> >>
> >>>> >> Definitely.
> >>>> >>
> >>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com>
> wrote:
> >>>> >>>
> >>>> >>> Could the error message suggest switching to FlinkRunner (and/or
> other runners that start a job server for you)? Then it seems like the
> breakage would only be a minor annoyance.
> >>>> >>>
> >>>> >>> Brian
> >>>> >>>
> >>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com>
> wrote:
> >>>> >>>>
> >>>> >>>> Hi all,
> >>>> >>>>
> >>>> >>>> Currently, when running a pipeline that has the options
> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
> Dockerized Flink job server [1]. This is problematic because the
> PortableRunner can be used by any portable runner. So for example, a Spark
> runner user was recently baffled when their job ran successfully but
> printed a bunch of Flink log messages.
> >>>> >>>>
> >>>> >>>> There are not too many uses of this default behavior to my
> knowledge, at least within Beam itself. The only example I could find was
> in the portableWordCount tests, which is mostly the same as
> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
> superseded by the FlinkRunner class, which provides better encapsulation.
> >>>> >>>>
> >>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In
> FlinkRunner, we pull the job server from Maven if necessary and call Java
> directly. In general, I think there are already quite enough knobs in the
> portability framework, so we should remove it unless there is reason to
> prefer running the job server with Docker instead of calling Java directly.
> >>>> >>>>
> >>>> >>>> There are a couple options:
> >>>> >>>>
> >>>> >>>> A) Remove the default behavior and require job_endpoint to
> always be set when using PortableRunner. This would be a breaking change.
> >>>> >>>> B) Keep the current behavior, but warn when the user sets
> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
> better than nothing.
> >>>> >>>>
> >>>> >>>> What do you think?
> >>>> >>>>
> >>>> >>>> [1]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
> >>>> >>>> [2]
> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
> >>>> >>>> [3]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>

Re: Rethinking Python's PortableRunner default job server

Posted by Ismaël Mejía <ie...@gmail.com>.
Thomas has a point on the PortableRunner name, I was super confused
because of the `PortableRunner` not being a runner, I don't know if
too late but maybe it is still worth to give it a better name.

On Thu, Apr 30, 2020 at 8:41 PM Thomas Weise <th...@apache.org> wrote:
>
> +1 for removing the default runner. It has always been the Beam user expectation that a runner needs to be selected.
>
> "PortableRunner" isn't a runner (despite its name) - it's a proxy to a runner that the user specifies via job_endpoint.
>
> Thanks for cleaning this up!
>
> On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <kc...@google.com> wrote:
>>
>> I'll bite :) Thanks for the feedback everyone!
>>
>> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com> wrote:
>>>
>>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>>>
>>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>>>
>>>> +1 for A there are zero reasons to have a default runner set by
>>>> default, being explicit is better as Robert suggests and it resolves
>>>> the confusion that the user reported.
>>>>
>>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com> wrote:
>>>> >
>>>> > +1, I was actually thinking about this just the other day. PortableRunner should require job_endpoint to be set, and we can have a nice error message directing the explicit use of FlinkRunner for the old behavior.
>>>> >
>>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com> wrote:
>>>> >>
>>>> >> > Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>>> >>
>>>> >> Definitely.
>>>> >>
>>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com> wrote:
>>>> >>>
>>>> >>> Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>>> >>>
>>>> >>> Brian
>>>> >>>
>>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:
>>>> >>>>
>>>> >>>> Hi all,
>>>> >>>>
>>>> >>>> Currently, when running a pipeline that has the options runner=PortableRunner and job_endpoint unset, the Python SDK spins up a Dockerized Flink job server [1]. This is problematic because the PortableRunner can be used by any portable runner. So for example, a Spark runner user was recently baffled when their job ran successfully but printed a bunch of Flink log messages.
>>>> >>>>
>>>> >>>> There are not too many uses of this default behavior to my knowledge, at least within Beam itself. The only example I could find was in the portableWordCount tests, which is mostly the same as portableWordCountFlinkRunner tests [2]. The default behavior is entirely superseded by the FlinkRunner class, which provides better encapsulation.
>>>> >>>>
>>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In FlinkRunner, we pull the job server from Maven if necessary and call Java directly. In general, I think there are already quite enough knobs in the portability framework, so we should remove it unless there is reason to prefer running the job server with Docker instead of calling Java directly.
>>>> >>>>
>>>> >>>> There are a couple options:
>>>> >>>>
>>>> >>>> A) Remove the default behavior and require job_endpoint to always be set when using PortableRunner. This would be a breaking change.
>>>> >>>> B) Keep the current behavior, but warn when the user sets runner=PortableRunner without job_endpoint. This is easy to miss, but it's better than nothing.
>>>> >>>>
>>>> >>>> What do you think?
>>>> >>>>
>>>> >>>> [1] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>>> >>>> [2] https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>>> >>>> [3] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163

Re: Rethinking Python's PortableRunner default job server

Posted by Thomas Weise <th...@apache.org>.
+1 for removing the default runner. It has always been the Beam user
expectation that a runner needs to be selected.

"PortableRunner" isn't a runner (despite its name) - it's a proxy to a
runner that the user specifies via job_endpoint.

Thanks for cleaning this up!

On Thu, Apr 30, 2020 at 10:11 AM Kyle Weaver <kc...@google.com> wrote:

> I'll bite :) Thanks for the feedback everyone!
>
> On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>>
>> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>>
>>> +1 for A there are zero reasons to have a default runner set by
>>> default, being explicit is better as Robert suggests and it resolves
>>> the confusion that the user reported.
>>>
>>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>> >
>>> > +1, I was actually thinking about this just the other day.
>>> PortableRunner should require job_endpoint to be set, and we can have a
>>> nice error message directing the explicit use of FlinkRunner for the old
>>> behavior.
>>> >
>>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com>
>>> wrote:
>>> >>
>>> >> > Could the error message suggest switching to FlinkRunner (and/or
>>> other runners that start a job server for you)? Then it seems like the
>>> breakage would only be a minor annoyance.
>>> >>
>>> >> Definitely.
>>> >>
>>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com>
>>> wrote:
>>> >>>
>>> >>> Could the error message suggest switching to FlinkRunner (and/or
>>> other runners that start a job server for you)? Then it seems like the
>>> breakage would only be a minor annoyance.
>>> >>>
>>> >>> Brian
>>> >>>
>>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com>
>>> wrote:
>>> >>>>
>>> >>>> Hi all,
>>> >>>>
>>> >>>> Currently, when running a pipeline that has the options
>>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>>> Dockerized Flink job server [1]. This is problematic because the
>>> PortableRunner can be used by any portable runner. So for example, a Spark
>>> runner user was recently baffled when their job ran successfully but
>>> printed a bunch of Flink log messages.
>>> >>>>
>>> >>>> There are not too many uses of this default behavior to my
>>> knowledge, at least within Beam itself. The only example I could find was
>>> in the portableWordCount tests, which is mostly the same as
>>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>>> superseded by the FlinkRunner class, which provides better encapsulation.
>>> >>>>
>>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In
>>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>>> directly. In general, I think there are already quite enough knobs in the
>>> portability framework, so we should remove it unless there is reason to
>>> prefer running the job server with Docker instead of calling Java directly.
>>> >>>>
>>> >>>> There are a couple options:
>>> >>>>
>>> >>>> A) Remove the default behavior and require job_endpoint to always
>>> be set when using PortableRunner. This would be a breaking change.
>>> >>>> B) Keep the current behavior, but warn when the user sets
>>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>>> better than nothing.
>>> >>>>
>>> >>>> What do you think?
>>> >>>>
>>> >>>> [1]
>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>> >>>> [2]
>>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>> >>>> [3]
>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>>
>>

Re: Rethinking Python's PortableRunner default job server

Posted by Kyle Weaver <kc...@google.com>.
I'll bite :) Thanks for the feedback everyone!

On Thu, Apr 30, 2020 at 1:01 PM Robert Bradshaw <ro...@google.com> wrote:

> I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?
>
> On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com> wrote:
>
>> +1 for A there are zero reasons to have a default runner set by
>> default, being explicit is better as Robert suggests and it resolves
>> the confusion that the user reported.
>>
>> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>> >
>> > +1, I was actually thinking about this just the other day.
>> PortableRunner should require job_endpoint to be set, and we can have a
>> nice error message directing the explicit use of FlinkRunner for the old
>> behavior.
>> >
>> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com>
>> wrote:
>> >>
>> >> > Could the error message suggest switching to FlinkRunner (and/or
>> other runners that start a job server for you)? Then it seems like the
>> breakage would only be a minor annoyance.
>> >>
>> >> Definitely.
>> >>
>> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com>
>> wrote:
>> >>>
>> >>> Could the error message suggest switching to FlinkRunner (and/or
>> other runners that start a job server for you)? Then it seems like the
>> breakage would only be a minor annoyance.
>> >>>
>> >>> Brian
>> >>>
>> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com>
>> wrote:
>> >>>>
>> >>>> Hi all,
>> >>>>
>> >>>> Currently, when running a pipeline that has the options
>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>> Dockerized Flink job server [1]. This is problematic because the
>> PortableRunner can be used by any portable runner. So for example, a Spark
>> runner user was recently baffled when their job ran successfully but
>> printed a bunch of Flink log messages.
>> >>>>
>> >>>> There are not too many uses of this default behavior to my
>> knowledge, at least within Beam itself. The only example I could find was
>> in the portableWordCount tests, which is mostly the same as
>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>> superseded by the FlinkRunner class, which provides better encapsulation.
>> >>>>
>> >>>> I also noticed that DockerizedJobServer is only used by [3]. In
>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>> directly. In general, I think there are already quite enough knobs in the
>> portability framework, so we should remove it unless there is reason to
>> prefer running the job server with Docker instead of calling Java directly.
>> >>>>
>> >>>> There are a couple options:
>> >>>>
>> >>>> A) Remove the default behavior and require job_endpoint to always be
>> set when using PortableRunner. This would be a breaking change.
>> >>>> B) Keep the current behavior, but warn when the user sets
>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>> better than nothing.
>> >>>>
>> >>>> What do you think?
>> >>>>
>> >>>> [1]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>> >>>> [2]
>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>> >>>> [3]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>
>

Re: Rethinking Python's PortableRunner default job server

Posted by Robert Bradshaw <ro...@google.com>.
I filed https://issues.apache.org/jira/browse/BEAM-9860. Any takers?

On Thu, Apr 30, 2020 at 5:49 AM Ismaël Mejía <ie...@gmail.com> wrote:

> +1 for A there are zero reasons to have a default runner set by
> default, being explicit is better as Robert suggests and it resolves
> the confusion that the user reported.
>
> On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com>
> wrote:
> >
> > +1, I was actually thinking about this just the other day.
> PortableRunner should require job_endpoint to be set, and we can have a
> nice error message directing the explicit use of FlinkRunner for the old
> behavior.
> >
> > On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com>
> wrote:
> >>
> >> > Could the error message suggest switching to FlinkRunner (and/or
> other runners that start a job server for you)? Then it seems like the
> breakage would only be a minor annoyance.
> >>
> >> Definitely.
> >>
> >> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com>
> wrote:
> >>>
> >>> Could the error message suggest switching to FlinkRunner (and/or other
> runners that start a job server for you)? Then it seems like the breakage
> would only be a minor annoyance.
> >>>
> >>> Brian
> >>>
> >>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com>
> wrote:
> >>>>
> >>>> Hi all,
> >>>>
> >>>> Currently, when running a pipeline that has the options
> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
> Dockerized Flink job server [1]. This is problematic because the
> PortableRunner can be used by any portable runner. So for example, a Spark
> runner user was recently baffled when their job ran successfully but
> printed a bunch of Flink log messages.
> >>>>
> >>>> There are not too many uses of this default behavior to my knowledge,
> at least within Beam itself. The only example I could find was in the
> portableWordCount tests, which is mostly the same as
> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
> superseded by the FlinkRunner class, which provides better encapsulation.
> >>>>
> >>>> I also noticed that DockerizedJobServer is only used by [3]. In
> FlinkRunner, we pull the job server from Maven if necessary and call Java
> directly. In general, I think there are already quite enough knobs in the
> portability framework, so we should remove it unless there is reason to
> prefer running the job server with Docker instead of calling Java directly.
> >>>>
> >>>> There are a couple options:
> >>>>
> >>>> A) Remove the default behavior and require job_endpoint to always be
> set when using PortableRunner. This would be a breaking change.
> >>>> B) Keep the current behavior, but warn when the user sets
> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
> better than nothing.
> >>>>
> >>>> What do you think?
> >>>>
> >>>> [1]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
> >>>> [2]
> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
> >>>> [3]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>

Re: Rethinking Python's PortableRunner default job server

Posted by Ismaël Mejía <ie...@gmail.com>.
+1 for A there are zero reasons to have a default runner set by
default, being explicit is better as Robert suggests and it resolves
the confusion that the user reported.

On Wed, Apr 29, 2020 at 10:05 PM Robert Bradshaw <ro...@google.com> wrote:
>
> +1, I was actually thinking about this just the other day. PortableRunner should require job_endpoint to be set, and we can have a nice error message directing the explicit use of FlinkRunner for the old behavior.
>
> On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com> wrote:
>>
>> > Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>
>> Definitely.
>>
>> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com> wrote:
>>>
>>> Could the error message suggest switching to FlinkRunner (and/or other runners that start a job server for you)? Then it seems like the breakage would only be a minor annoyance.
>>>
>>> Brian
>>>
>>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> Currently, when running a pipeline that has the options runner=PortableRunner and job_endpoint unset, the Python SDK spins up a Dockerized Flink job server [1]. This is problematic because the PortableRunner can be used by any portable runner. So for example, a Spark runner user was recently baffled when their job ran successfully but printed a bunch of Flink log messages.
>>>>
>>>> There are not too many uses of this default behavior to my knowledge, at least within Beam itself. The only example I could find was in the portableWordCount tests, which is mostly the same as portableWordCountFlinkRunner tests [2]. The default behavior is entirely superseded by the FlinkRunner class, which provides better encapsulation.
>>>>
>>>> I also noticed that DockerizedJobServer is only used by [3]. In FlinkRunner, we pull the job server from Maven if necessary and call Java directly. In general, I think there are already quite enough knobs in the portability framework, so we should remove it unless there is reason to prefer running the job server with Docker instead of calling Java directly.
>>>>
>>>> There are a couple options:
>>>>
>>>> A) Remove the default behavior and require job_endpoint to always be set when using PortableRunner. This would be a breaking change.
>>>> B) Keep the current behavior, but warn when the user sets runner=PortableRunner without job_endpoint. This is easy to miss, but it's better than nothing.
>>>>
>>>> What do you think?
>>>>
>>>> [1] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>>> [2] https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>>> [3] https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163

Re: Rethinking Python's PortableRunner default job server

Posted by Robert Bradshaw <ro...@google.com>.
+1, I was actually thinking about this just the other day. PortableRunner
should require job_endpoint to be set, and we can have a nice error message
directing the explicit use of FlinkRunner for the old behavior.

On Wed, Apr 29, 2020 at 11:50 AM Kyle Weaver <kc...@google.com> wrote:

> > Could the error message suggest switching to FlinkRunner (and/or other
> runners that start a job server for you)? Then it seems like the breakage
> would only be a minor annoyance.
>
> Definitely.
>
> On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com> wrote:
>
>> Could the error message suggest switching to FlinkRunner (and/or other
>> runners that start a job server for you)? Then it seems like the breakage
>> would only be a minor annoyance.
>>
>> Brian
>>
>> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:
>>
>>> Hi all,
>>>
>>> Currently, when running a pipeline that has the options
>>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>>> Dockerized Flink job server [1]. This is problematic because the
>>> PortableRunner can be used by any portable runner. So for example, a Spark
>>> runner user was recently baffled when their job ran successfully but
>>> printed a bunch of Flink log messages.
>>>
>>> There are not too many uses of this default behavior to my knowledge, at
>>> least within Beam itself. The only example I could find was in the
>>> portableWordCount tests, which is mostly the same as
>>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>>> superseded by the FlinkRunner class, which provides better encapsulation.
>>>
>>> I also noticed that DockerizedJobServer is only used by [3]. In
>>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>>> directly. In general, I think there are already quite enough knobs in the
>>> portability framework, so we should remove it unless there is reason to
>>> prefer running the job server with Docker instead of calling Java directly.
>>>
>>> There are a couple options:
>>>
>>> A) Remove the default behavior and require job_endpoint to always be set
>>> when using PortableRunner. This would be a breaking change.
>>> B) Keep the current behavior, but warn when the user sets
>>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>>> better than nothing.
>>>
>>> What do you think?
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>>> [2]
>>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>>> [3]
>>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>>
>>

Re: Rethinking Python's PortableRunner default job server

Posted by Kyle Weaver <kc...@google.com>.
> Could the error message suggest switching to FlinkRunner (and/or other
runners that start a job server for you)? Then it seems like the breakage
would only be a minor annoyance.

Definitely.

On Wed, Apr 29, 2020 at 2:49 PM Brian Hulette <bh...@google.com> wrote:

> Could the error message suggest switching to FlinkRunner (and/or other
> runners that start a job server for you)? Then it seems like the breakage
> would only be a minor annoyance.
>
> Brian
>
> On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:
>
>> Hi all,
>>
>> Currently, when running a pipeline that has the options
>> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
>> Dockerized Flink job server [1]. This is problematic because the
>> PortableRunner can be used by any portable runner. So for example, a Spark
>> runner user was recently baffled when their job ran successfully but
>> printed a bunch of Flink log messages.
>>
>> There are not too many uses of this default behavior to my knowledge, at
>> least within Beam itself. The only example I could find was in the
>> portableWordCount tests, which is mostly the same as
>> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
>> superseded by the FlinkRunner class, which provides better encapsulation.
>>
>> I also noticed that DockerizedJobServer is only used by [3]. In
>> FlinkRunner, we pull the job server from Maven if necessary and call Java
>> directly. In general, I think there are already quite enough knobs in the
>> portability framework, so we should remove it unless there is reason to
>> prefer running the job server with Docker instead of calling Java directly.
>>
>> There are a couple options:
>>
>> A) Remove the default behavior and require job_endpoint to always be set
>> when using PortableRunner. This would be a breaking change.
>> B) Keep the current behavior, but warn when the user sets
>> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
>> better than nothing.
>>
>> What do you think?
>>
>> [1]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
>> [2]
>> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
>> [3]
>> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>>
>

Re: Rethinking Python's PortableRunner default job server

Posted by Brian Hulette <bh...@google.com>.
Could the error message suggest switching to FlinkRunner (and/or other
runners that start a job server for you)? Then it seems like the breakage
would only be a minor annoyance.

Brian

On Wed, Apr 29, 2020 at 11:32 AM Kyle Weaver <kc...@google.com> wrote:

> Hi all,
>
> Currently, when running a pipeline that has the options
> runner=PortableRunner and job_endpoint unset, the Python SDK spins up a
> Dockerized Flink job server [1]. This is problematic because the
> PortableRunner can be used by any portable runner. So for example, a Spark
> runner user was recently baffled when their job ran successfully but
> printed a bunch of Flink log messages.
>
> There are not too many uses of this default behavior to my knowledge, at
> least within Beam itself. The only example I could find was in the
> portableWordCount tests, which is mostly the same as
> portableWordCountFlinkRunner tests [2]. The default behavior is entirely
> superseded by the FlinkRunner class, which provides better encapsulation.
>
> I also noticed that DockerizedJobServer is only used by [3]. In
> FlinkRunner, we pull the job server from Maven if necessary and call Java
> directly. In general, I think there are already quite enough knobs in the
> portability framework, so we should remove it unless there is reason to
> prefer running the job server with Docker instead of calling Java directly.
>
> There are a couple options:
>
> A) Remove the default behavior and require job_endpoint to always be set
> when using PortableRunner. This would be a breaking change.
> B) Keep the current behavior, but warn when the user sets
> runner=PortableRunner without job_endpoint. This is easy to miss, but it's
> better than nothing.
>
> What do you think?
>
> [1]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L184
> [2]
> https://github.com/apache/beam/blob/b3596b89dbc002c686bdaa7853074e757a81b6fb/buildSrc/src/main/groovy/org/apache/beam/gradle/BeamModulePlugin.groovy#L1983-L2048
> [3]
> https://github.com/apache/beam/blob/33c73739cec8bc6a7c8319efa41eda7a2540bce1/sdks/python/apache_beam/runners/portability/job_server.py#L163
>