You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Reza Rokni <re...@google.com> on 2019/05/03 02:59:52 UTC

Better naming for runner specific options

Hi,

Was reading this SO question:

https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has

And noticed that in

https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

The option is called --worker_machine_type.

I wonder if runner specific options should have the runner in the prefix?
Something like --dataflow_worker_machine_type?

Cheers
Reza

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Better naming for runner specific options

Posted by Maximilian Michels <mx...@apache.org>.
+1

On 22.05.19 04:28, Reza Rokni wrote:
> Hi,
> 
> Coming back to this, is the general consensus that this can be addressed 
> via https://issues.apache.org/jira/browse/BEAM-6531 in Beam 3.0?
> 
> Cheers
> Reza
> 
> On Tue, 7 May 2019 at 23:15, Valentyn Tymofieiev <valentyn@google.com 
> <ma...@google.com>> wrote:
> 
>     I think using RunnerOptions was an idea at some point, but in
>     Python, we ended up parsing options from the runner api without
>     populating RunnerOptions, and  RunnerOptions was eventually removed [1].
> 
>     If we decide to rename options, a path forward may be to have
>     runners recognize both old and new names until Beam 3.0, but update
>     codebase, examples and documentation to use new names.
> 
>     [1]
>     https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815
> 
>     *From:*Ahmet Altay <altay@google.com <ma...@google.com>>
>     *Date:*Mon, May 6, 2019, 6:01 PM
>     *To:*dev
> 
>         There is RunnerOptions already. Its options are populated by
>         querying the job service. Any portable runner is able to provide
>         a list of options that is runner specific through that mechanism.
> 
>         *From: *Reza Rokni <rez@google.com <ma...@google.com>>
>         *Date: *Mon, May 6, 2019 at 2:57 PM
>         *To: * <dev@beam.apache.org <ma...@beam.apache.org>>
> 
>             So the options here would be moved to runner options?
>             https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
> 
>             In Java they are in DataflowPipelineWorkerPoolOptions and of
>             course we have FlinkPipelineOptions etc...
> 
>             *From: *Chamikara Jayalath <chamikara@google.com
>             <ma...@google.com>>
>             *Date: *Tue, 7 May 2019 at 05:29
>             *To: *dev
> 
> 
>                 On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik
>                 <lcwik@google.com <ma...@google.com>> wrote:
> 
>                     There were also discussions[1] in the past about
>                     scoping PipelineOptions to specific PTransforms.
>                     Would scoping PipelineOptions to PTransforms make
>                     this a more general solution?
> 
>                     1:
>                     https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
> 
> 
>                 Is this just for pipeline construction time or also for
>                 runtime ? Trying to scope options for transforms at
>                 runtime might complicate things in the presence of
>                 optimizations such as fusion.
> 
> 
>                     On Mon, May 6, 2019 at 12:02 PM Ankur Goenka
>                     <goenka@google.com <ma...@google.com>> wrote:
> 
>                         Having namespaces for option makes sense.
>                         I think, along with a help command to print all
>                         the options given the runner name will be useful.
>                         As for the scope of name spacing, I think that
>                         assigning a logical name space gives more
>                         flexibility around how and where we declare
>                         options. It also make future refactoring possible.
> 
> 
>                         On Mon, May 6, 2019 at 7:50 AM Maximilian
>                         Michels <mxm@apache.org <ma...@apache.org>>
>                         wrote:
> 
>                             Good points. As already mentioned there is
>                             no namespacing between the
>                             different pipeline option classes. In
>                             particular, there is no separate
>                             namespace for system and user options which
>                             is most concerning.
> 
>                             I'm in favor of an optional namespace using
>                             the class name of the
>                             defining pipeline option class. That way we
>                             would at least be able to
>                             resolve duplicate option names. For example,
>                             if there were was "optionX"
>                             in class A and B, we could use "A#optionX"
>                             to refer to it from class A.
> 
> 
>                 I think this solves the original problem. Runner
>                 specific options will have unique names that includes
>                 the runner (in options class). I guess to be complete we
>                 also have to include the package (module for Python) ?
>                 If an option is globally unique, users should be able to
>                 specify it without qualifying (at least for backwards
>                 compatibility).
> 
> 
>                             -Max
> 
>                             On 04.05.19 02:23, Reza Rokni wrote:
>                              > Great point Lukasz, worker machine could
>                             be relevant to multiple runners.
>                              >
>                              > Perhaps for parameters that could have
>                             multiple runner relevance, the
>                              > doc could be rephrased to reflect its
>                             potential multiple uses. For
>                              > example change the help information to
>                             start with a generic reference "
>                              > worker type on the runner" followed by
>                             runner specific behavior expected
>                              > for RunnerA, RunnerB etc...
>                              >
>                              > But I do worry that without prefix even
>                             generic options could cause
>                              > confusion. For example if the use of
>                             --network is substantially
>                              > different between runnerA vs runnerB then
>                             the user will only have this
>                              > information by reading the help. It will
>                             also mean that a pipeline which
>                              > is expected to work both on-premise on
>                             RunnerA and in the cloud on
>                              > RunnerB could fail because the format of
>                             the options to pass to
>                              > --network are different.
>                              >
>                              > Cheers
>                              >
>                              > Reza
>                              >
>                              > *From: *Kenneth Knowles <kenn@apache.org
>                             <ma...@apache.org>
>                             <mailto:kenn@apache.org
>                             <ma...@apache.org>>>
>                              > *Date: *Sat, 4 May 2019 at 03:54
>                              > *To: *dev
>                              >
>                              >     Even though they are in classes named
>                             for specific runners, they are
>                              >     not namespaced. All PipelineOptions
>                             exist in a global namespace so
>                              >     they need to be careful to be very
>                             precise.
>                              >
>                              >     It is a good point that even though
>                             they may be multiple uses for
>                              >     "machine type" they are probably not
>                             going to both happen at the
>                              >     same time.
>                              >
>                              >     If it becomes an issue, another thing
>                             we could do would be to add
>                              >     namespacing support so options have
>                             less spooky action, or at least
>                              >     have a way to resolve it when it
>                             happens on accident.
>                              >
>                              >     Kenn
>                              >
>                              >     On Fri, May 3, 2019 at 10:43 AM
>                             Chamikara Jayalath
>                              >     <chamikara@google.com
>                             <ma...@google.com>
>                             <mailto:chamikara@google.com
>                             <ma...@google.com>>> wrote:
>                              >
>                              >         Also, we do have runner specific
>                             options classes where truly
>                              >         runner specific options can go.
>                              >
>                              >
>                             https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>                              >
>                             https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>                              >
>                              >         On Fri, May 3, 2019 at 9:50 AM
>                             Ahmet Altay <altay@google.com
>                             <ma...@google.com>
>                              >         <mailto:altay@google.com
>                             <ma...@google.com>>> wrote:
>                              >
>                              >             I agree, that is a good point.
>                              >
>                              >             *From: *Lukasz Cwik
>                             <lcwik@google.com <ma...@google.com>
>                             <mailto:lcwik@google.com
>                             <ma...@google.com>>>
>                              >             *Date: *Fri, May 3, 2019 at
>                             9:37 AM
>                              >             *To: *dev
>                              >
>                              >                 The concept of a machine
>                             type isn't necessarily limited
>                              >                 to Dataflow. If it made
>                             sense for a runner, they could
>                              >                 use AWS/Azure machine
>                             types as well.
>                              >
>                              >                 On Fri, May 3, 2019 at
>                             9:32 AM Ahmet Altay
>                              >                 <altay@google.com
>                             <ma...@google.com>
>                             <mailto:altay@google.com
>                             <ma...@google.com>>> wrote:
>                              >
>                              >                     This idea was
>                             discussed in a PR a few months ago,
>                              >                     and JIRA was filed as
>                             a follow up [1]. IMO, it makes
>                              >                     sense to use a
>                             namespace prefix. The primary issue
>                              >                     here is that, such a
>                             change will very likely be a
>                              >                     backward incompatible
>                             change and would be hard to do
>                              >                     before the next major
>                             version.
>                              >
>                              >                     [1]
>                             https://issues.apache.org/jira/browse/BEAM-6531
>                              >
>                              >                     *From: *Reza Rokni
>                             <rez@google.com <ma...@google.com>
>                              >                   
>                               <mailto:rez@google.com
>                             <ma...@google.com>>>
>                              >                     *Date: *Thu, May 2,
>                             2019 at 8:00 PM
>                              >                     *To: *
>                             <dev@beam.apache.org
>                             <ma...@beam.apache.org>
>                              >                   
>                               <mailto:dev@beam.apache.org
>                             <ma...@beam.apache.org>>>
>                              >
>                              >                         Hi,
>                              >
>                              >                         Was reading this
>                             SO question:
>                              >
>                              >
>                             https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>                              >
>                              >                         And noticed that in
>                              >
>                              >
>                             https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>                              >
>                              >                         The option is
>                             called --worker_machine_type.
>                              >
>                              >                         I wonder if
>                             runner specific options should have
>                              >                         the runner in the
>                             prefix? Something like
>                              >                       
>                               --dataflow_worker_machine_type?
>                              >
>                              >                         Cheers
>                              >                         Reza
>                              >
>                              >                         --
>                              >
>                              >                         This email may be
>                             confidential and privileged.
>                              >                         If you received
>                             this communication by mistake,
>                              >                         please don't
>                             forward it to anyone else, please
>                              >                         erase all copies
>                             and attachments, and please let
>                              >                         me know that it
>                             has gone to the wrong person.
>                              >
>                              >                         The above terms
>                             reflect a potential business
>                              >                         arrangement, are
>                             provided solely as a basis for
>                              >                         further
>                             discussion, and are not intended to be
>                              >                         and do not
>                             constitute a legally binding
>                              >                         obligation. No
>                             legally binding obligations will
>                              >                         be created,
>                             implied, or inferred until an
>                              >                         agreement in
>                             final form is executed in writing
>                              >                         by all parties
>                             involved.
>                              >
>                              >
>                              >
>                              > --
>                              >
>                              > This email may be confidential and
>                             privileged. If you received this
>                              > communication by mistake, please don't
>                             forward it to anyone else, please
>                              > erase all copies and attachments, and
>                             please let me know that it has
>                              > gone to the wrong person.
>                              >
>                              > The above terms reflect a potential
>                             business arrangement, are provided
>                              > solely as a basis for further discussion,
>                             and are not intended to be and
>                              > do not constitute a legally binding
>                             obligation. No legally binding
>                              > obligations will be created, implied, or
>                             inferred until an agreement in
>                              > final form is executed in writing by all
>                             parties involved.
>                              >
> 
> 
> 
>             -- 
> 
>             This email may be confidential and privileged. If you
>             received this communication by mistake, please don't forward
>             it to anyone else, please erase all copies and attachments,
>             and please let me know that it has gone to the wrong person.
> 
>             The above terms reflect a potential business arrangement,
>             are provided solely as a basis for further discussion, and
>             are not intended to be and do not constitute a legally
>             binding obligation. No legally binding obligations will be
>             created, implied, or inferred until an agreement in final
>             form is executed in writing by all parties involved.
> 
> 
> 
> -- 
> 
> This email may be confidential and privileged. If you received this 
> communication by mistake, please don't forward it to anyone else, please 
> erase all copies and attachments, and please let me know that it has 
> gone to the wrong person.
> 
> The above terms reflect a potential business arrangement, are provided 
> solely as a basis for further discussion, and are not intended to be and 
> do not constitute a legally binding obligation. No legally binding 
> obligations will be created, implied, or inferred until an agreement in 
> final form is executed in writing by all parties involved.
> 

Re: Better naming for runner specific options

Posted by Reza Rokni <re...@google.com>.
Hi,

Coming back to this, is the general consensus that this can be addressed
via https://issues.apache.org/jira/browse/BEAM-6531 in Beam 3.0?

Cheers
Reza

On Tue, 7 May 2019 at 23:15, Valentyn Tymofieiev <va...@google.com>
wrote:

> I think using RunnerOptions was an idea at some point, but in Python, we
> ended up parsing options from the runner api without populating
> RunnerOptions, and  RunnerOptions was eventually removed [1].
>
> If we decide to rename options, a path forward may be to have runners
> recognize both old and new names until Beam 3.0, but update codebase,
> examples and documentation to use new names.
>
> [1]
> https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815
>
> *From:*Ahmet Altay <al...@google.com>
> *Date:*Mon, May 6, 2019, 6:01 PM
> *To:*dev
>
> There is RunnerOptions already. Its options are populated by querying the
>> job service. Any portable runner is able to provide a list of options that
>> is runner specific through that mechanism.
>>
>> *From: *Reza Rokni <re...@google.com>
>> *Date: *Mon, May 6, 2019 at 2:57 PM
>> *To: * <de...@beam.apache.org>
>>
>> So the options here would be moved to runner options?
>>>
>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>
>>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>>> have FlinkPipelineOptions etc...
>>>
>>> *From: *Chamikara Jayalath <ch...@google.com>
>>> *Date: *Tue, 7 May 2019 at 05:29
>>> *To: *dev
>>>
>>>
>>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:
>>>>
>>>>> There were also discussions[1] in the past about scoping
>>>>> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
>>>>> PTransforms make this a more general solution?
>>>>>
>>>>> 1:
>>>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>>>
>>>>
>>>> Is this just for pipeline construction time or also for runtime ?
>>>> Trying to scope options for transforms at runtime might complicate things
>>>> in the presence of optimizations such as fusion.
>>>>
>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Having namespaces for option makes sense.
>>>>>> I think, along with a help command to print all the options given the
>>>>>> runner name will be useful.
>>>>>> As for the scope of name spacing, I think that assigning a logical
>>>>>> name space gives more flexibility around how and where we declare options.
>>>>>> It also make future refactoring possible.
>>>>>>
>>>>>>
>>>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org>
>>>>>> wrote:
>>>>>>
>>>>>>> Good points. As already mentioned there is no namespacing between
>>>>>>> the
>>>>>>> different pipeline option classes. In particular, there is no
>>>>>>> separate
>>>>>>> namespace for system and user options which is most concerning.
>>>>>>>
>>>>>>> I'm in favor of an optional namespace using the class name of the
>>>>>>> defining pipeline option class. That way we would at least be able
>>>>>>> to
>>>>>>> resolve duplicate option names. For example, if there were was
>>>>>>> "optionX"
>>>>>>> in class A and B, we could use "A#optionX" to refer to it from class
>>>>>>> A.
>>>>>>>
>>>>>>
>>>> I think this solves the original problem. Runner specific options will
>>>> have unique names that includes the runner (in options class). I guess to
>>>> be complete we also have to include the package (module for Python) ?
>>>> If an option is globally unique, users should be able to specify it
>>>> without qualifying (at least for backwards compatibility).
>>>>
>>>>
>>>>>
>>>>>>> -Max
>>>>>>>
>>>>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>>>>> runners.
>>>>>>> >
>>>>>>> > Perhaps for parameters that could have multiple runner relevance,
>>>>>>> the
>>>>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>>>>> > example change the help information to start with a generic
>>>>>>> reference "
>>>>>>> > worker type on the runner" followed by runner specific behavior
>>>>>>> expected
>>>>>>> > for RunnerA, RunnerB etc...
>>>>>>> >
>>>>>>> > But I do worry that without prefix even generic options could
>>>>>>> cause
>>>>>>> > confusion. For example if the use of --network is substantially
>>>>>>> > different between runnerA vs runnerB then the user will only have
>>>>>>> this
>>>>>>> > information by reading the help. It will also mean that a pipeline
>>>>>>> which
>>>>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>>>>> > RunnerB could fail because the format of the options to pass to
>>>>>>> > --network are different.
>>>>>>> >
>>>>>>> > Cheers
>>>>>>> >
>>>>>>> > Reza
>>>>>>> >
>>>>>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>>>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>>>>> > *To: *dev
>>>>>>> >
>>>>>>> >     Even though they are in classes named for specific runners,
>>>>>>> they are
>>>>>>> >     not namespaced. All PipelineOptions exist in a global
>>>>>>> namespace so
>>>>>>> >     they need to be careful to be very precise.
>>>>>>> >
>>>>>>> >     It is a good point that even though they may be multiple uses
>>>>>>> for
>>>>>>> >     "machine type" they are probably not going to both happen at
>>>>>>> the
>>>>>>> >     same time.
>>>>>>> >
>>>>>>> >     If it becomes an issue, another thing we could do would be to
>>>>>>> add
>>>>>>> >     namespacing support so options have less spooky action, or at
>>>>>>> least
>>>>>>> >     have a way to resolve it when it happens on accident.
>>>>>>> >
>>>>>>> >     Kenn
>>>>>>> >
>>>>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>>>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>>>>>> >
>>>>>>> >         Also, we do have runner specific options classes where
>>>>>>> truly
>>>>>>> >         runner specific options can go.
>>>>>>> >
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>>>>> >
>>>>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <
>>>>>>> altay@google.com
>>>>>>> >         <ma...@google.com>> wrote:
>>>>>>> >
>>>>>>> >             I agree, that is a good point.
>>>>>>> >
>>>>>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>>>>>> lcwik@google.com>>
>>>>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>>>>> >             *To: *dev
>>>>>>> >
>>>>>>> >                 The concept of a machine type isn't necessarily
>>>>>>> limited
>>>>>>> >                 to Dataflow. If it made sense for a runner, they
>>>>>>> could
>>>>>>> >                 use AWS/Azure machine types as well.
>>>>>>> >
>>>>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>>>>> >                 <altay@google.com <ma...@google.com>>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >                     This idea was discussed in a PR a few months
>>>>>>> ago,
>>>>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>>>>> makes
>>>>>>> >                     sense to use a namespace prefix. The primary
>>>>>>> issue
>>>>>>> >                     here is that, such a change will very likely
>>>>>>> be a
>>>>>>> >                     backward incompatible change and would be hard
>>>>>>> to do
>>>>>>> >                     before the next major version.
>>>>>>> >
>>>>>>> >                     [1]
>>>>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>>>>> >
>>>>>>> >                     *From: *Reza Rokni <rez@google.com
>>>>>>> >                     <ma...@google.com>>
>>>>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>>>> >                     *To: * <dev@beam.apache.org
>>>>>>> >                     <ma...@beam.apache.org>>
>>>>>>> >
>>>>>>> >                         Hi,
>>>>>>> >
>>>>>>> >                         Was reading this SO question:
>>>>>>> >
>>>>>>> >
>>>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>>> >
>>>>>>> >                         And noticed that in
>>>>>>> >
>>>>>>> >
>>>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>>> >
>>>>>>> >                         The option is called --worker_machine_type.
>>>>>>> >
>>>>>>> >                         I wonder if runner specific options should
>>>>>>> have
>>>>>>> >                         the runner in the prefix? Something like
>>>>>>> >                         --dataflow_worker_machine_type?
>>>>>>> >
>>>>>>> >                         Cheers
>>>>>>> >                         Reza
>>>>>>> >
>>>>>>> >                         --
>>>>>>> >
>>>>>>> >                         This email may be confidential and
>>>>>>> privileged.
>>>>>>> >                         If you received this communication by
>>>>>>> mistake,
>>>>>>> >                         please don't forward it to anyone else,
>>>>>>> please
>>>>>>> >                         erase all copies and attachments, and
>>>>>>> please let
>>>>>>> >                         me know that it has gone to the wrong
>>>>>>> person.
>>>>>>> >
>>>>>>> >                         The above terms reflect a potential
>>>>>>> business
>>>>>>> >                         arrangement, are provided solely as a
>>>>>>> basis for
>>>>>>> >                         further discussion, and are not intended
>>>>>>> to be
>>>>>>> >                         and do not constitute a legally binding
>>>>>>> >                         obligation. No legally binding obligations
>>>>>>> will
>>>>>>> >                         be created, implied, or inferred until an
>>>>>>> >                         agreement in final form is executed in
>>>>>>> writing
>>>>>>> >                         by all parties involved.
>>>>>>> >
>>>>>>> >
>>>>>>> >
>>>>>>> > --
>>>>>>> >
>>>>>>> > This email may be confidential and privileged. If you received
>>>>>>> this
>>>>>>> > communication by mistake, please don't forward it to anyone else,
>>>>>>> please
>>>>>>> > erase all copies and attachments, and please let me know that it
>>>>>>> has
>>>>>>> > gone to the wrong person.
>>>>>>> >
>>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>>> provided
>>>>>>> > solely as a basis for further discussion, and are not intended to
>>>>>>> be and
>>>>>>> > do not constitute a legally binding obligation. No legally binding
>>>>>>> > obligations will be created, implied, or inferred until an
>>>>>>> agreement in
>>>>>>> > final form is executed in writing by all parties involved.
>>>>>>> >
>>>>>>>
>>>>>>
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Better naming for runner specific options

Posted by Valentyn Tymofieiev <va...@google.com>.
I think using RunnerOptions was an idea at some point, but in Python, we
ended up parsing options from the runner api without populating
RunnerOptions, and  RunnerOptions was eventually removed [1].

If we decide to rename options, a path forward may be to have runners
recognize both old and new names until Beam 3.0, but update codebase,
examples and documentation to use new names.

[1]
https://github.com/apache/beam/commit/f3623e8ba2257f7659ccb312dc2574f862ef41b5#diff-525d5d65bedd7ea5e6fce6e4cd57e153L815

*From:*Ahmet Altay <al...@google.com>
*Date:*Mon, May 6, 2019, 6:01 PM
*To:*dev

There is RunnerOptions already. Its options are populated by querying the
> job service. Any portable runner is able to provide a list of options that
> is runner specific through that mechanism.
>
> *From: *Reza Rokni <re...@google.com>
> *Date: *Mon, May 6, 2019 at 2:57 PM
> *To: * <de...@beam.apache.org>
>
> So the options here would be moved to runner options?
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>> have FlinkPipelineOptions etc...
>>
>> *From: *Chamikara Jayalath <ch...@google.com>
>> *Date: *Tue, 7 May 2019 at 05:29
>> *To: *dev
>>
>>
>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> There were also discussions[1] in the past about scoping
>>>> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
>>>> PTransforms make this a more general solution?
>>>>
>>>> 1:
>>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>>
>>>
>>> Is this just for pipeline construction time or also for runtime ? Trying
>>> to scope options for transforms at runtime might complicate things in the
>>> presence of optimizations such as fusion.
>>>
>>>
>>>>
>>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:
>>>>
>>>>> Having namespaces for option makes sense.
>>>>> I think, along with a help command to print all the options given the
>>>>> runner name will be useful.
>>>>> As for the scope of name spacing, I think that assigning a logical
>>>>> name space gives more flexibility around how and where we declare options.
>>>>> It also make future refactoring possible.
>>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Good points. As already mentioned there is no namespacing between the
>>>>>> different pipeline option classes. In particular, there is no
>>>>>> separate
>>>>>> namespace for system and user options which is most concerning.
>>>>>>
>>>>>> I'm in favor of an optional namespace using the class name of the
>>>>>> defining pipeline option class. That way we would at least be able to
>>>>>> resolve duplicate option names. For example, if there were was
>>>>>> "optionX"
>>>>>> in class A and B, we could use "A#optionX" to refer to it from class
>>>>>> A.
>>>>>>
>>>>>
>>> I think this solves the original problem. Runner specific options will
>>> have unique names that includes the runner (in options class). I guess to
>>> be complete we also have to include the package (module for Python) ?
>>> If an option is globally unique, users should be able to specify it
>>> without qualifying (at least for backwards compatibility).
>>>
>>>
>>>>
>>>>>> -Max
>>>>>>
>>>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>>>> runners.
>>>>>> >
>>>>>> > Perhaps for parameters that could have multiple runner relevance,
>>>>>> the
>>>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>>>> > example change the help information to start with a generic
>>>>>> reference "
>>>>>> > worker type on the runner" followed by runner specific behavior
>>>>>> expected
>>>>>> > for RunnerA, RunnerB etc...
>>>>>> >
>>>>>> > But I do worry that without prefix even generic options could cause
>>>>>> > confusion. For example if the use of --network is substantially
>>>>>> > different between runnerA vs runnerB then the user will only have
>>>>>> this
>>>>>> > information by reading the help. It will also mean that a pipeline
>>>>>> which
>>>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>>>> > RunnerB could fail because the format of the options to pass to
>>>>>> > --network are different.
>>>>>> >
>>>>>> > Cheers
>>>>>> >
>>>>>> > Reza
>>>>>> >
>>>>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>>>> > *To: *dev
>>>>>> >
>>>>>> >     Even though they are in classes named for specific runners,
>>>>>> they are
>>>>>> >     not namespaced. All PipelineOptions exist in a global namespace
>>>>>> so
>>>>>> >     they need to be careful to be very precise.
>>>>>> >
>>>>>> >     It is a good point that even though they may be multiple uses
>>>>>> for
>>>>>> >     "machine type" they are probably not going to both happen at the
>>>>>> >     same time.
>>>>>> >
>>>>>> >     If it becomes an issue, another thing we could do would be to
>>>>>> add
>>>>>> >     namespacing support so options have less spooky action, or at
>>>>>> least
>>>>>> >     have a way to resolve it when it happens on accident.
>>>>>> >
>>>>>> >     Kenn
>>>>>> >
>>>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >         Also, we do have runner specific options classes where truly
>>>>>> >         runner specific options can go.
>>>>>> >
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>>>> >
>>>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <
>>>>>> altay@google.com
>>>>>> >         <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >             I agree, that is a good point.
>>>>>> >
>>>>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>>>>> lcwik@google.com>>
>>>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>>>> >             *To: *dev
>>>>>> >
>>>>>> >                 The concept of a machine type isn't necessarily
>>>>>> limited
>>>>>> >                 to Dataflow. If it made sense for a runner, they
>>>>>> could
>>>>>> >                 use AWS/Azure machine types as well.
>>>>>> >
>>>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>>>> >                 <altay@google.com <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >                     This idea was discussed in a PR a few months
>>>>>> ago,
>>>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>>>> makes
>>>>>> >                     sense to use a namespace prefix. The primary
>>>>>> issue
>>>>>> >                     here is that, such a change will very likely be
>>>>>> a
>>>>>> >                     backward incompatible change and would be hard
>>>>>> to do
>>>>>> >                     before the next major version.
>>>>>> >
>>>>>> >                     [1]
>>>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>>>> >
>>>>>> >                     *From: *Reza Rokni <rez@google.com
>>>>>> >                     <ma...@google.com>>
>>>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>>> >                     *To: * <dev@beam.apache.org
>>>>>> >                     <ma...@beam.apache.org>>
>>>>>> >
>>>>>> >                         Hi,
>>>>>> >
>>>>>> >                         Was reading this SO question:
>>>>>> >
>>>>>> >
>>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>> >
>>>>>> >                         And noticed that in
>>>>>> >
>>>>>> >
>>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>> >
>>>>>> >                         The option is called --worker_machine_type.
>>>>>> >
>>>>>> >                         I wonder if runner specific options should
>>>>>> have
>>>>>> >                         the runner in the prefix? Something like
>>>>>> >                         --dataflow_worker_machine_type?
>>>>>> >
>>>>>> >                         Cheers
>>>>>> >                         Reza
>>>>>> >
>>>>>> >                         --
>>>>>> >
>>>>>> >                         This email may be confidential and
>>>>>> privileged.
>>>>>> >                         If you received this communication by
>>>>>> mistake,
>>>>>> >                         please don't forward it to anyone else,
>>>>>> please
>>>>>> >                         erase all copies and attachments, and
>>>>>> please let
>>>>>> >                         me know that it has gone to the wrong
>>>>>> person.
>>>>>> >
>>>>>> >                         The above terms reflect a potential business
>>>>>> >                         arrangement, are provided solely as a basis
>>>>>> for
>>>>>> >                         further discussion, and are not intended to
>>>>>> be
>>>>>> >                         and do not constitute a legally binding
>>>>>> >                         obligation. No legally binding obligations
>>>>>> will
>>>>>> >                         be created, implied, or inferred until an
>>>>>> >                         agreement in final form is executed in
>>>>>> writing
>>>>>> >                         by all parties involved.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > This email may be confidential and privileged. If you received this
>>>>>> > communication by mistake, please don't forward it to anyone else,
>>>>>> please
>>>>>> > erase all copies and attachments, and please let me know that it
>>>>>> has
>>>>>> > gone to the wrong person.
>>>>>> >
>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>> provided
>>>>>> > solely as a basis for further discussion, and are not intended to
>>>>>> be and
>>>>>> > do not constitute a legally binding obligation. No legally binding
>>>>>> > obligations will be created, implied, or inferred until an
>>>>>> agreement in
>>>>>> > final form is executed in writing by all parties involved.
>>>>>> >
>>>>>>
>>>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Re: Better naming for runner specific options

Posted by Chamikara Jayalath <ch...@google.com>.
On Mon, May 6, 2019 at 3:01 PM Ahmet Altay <al...@google.com> wrote:

> There is RunnerOptions already. Its options are populated by querying the
> job service. Any portable runner is able to provide a list of options that
> is runner specific through that mechanism.
>
> *From: *Reza Rokni <re...@google.com>
> *Date: *Mon, May 6, 2019 at 2:57 PM
> *To: * <de...@beam.apache.org>
>
> So the options here would be moved to runner options?
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>
In theory at least, many options specified in WorkerOptions can apply for
all runners hence probably are not truly runner-specific (num_workers,
zone, worker_machine_type, etc). Also, moving existing options might be
hard due to backwards compatibility reasons.

Some of the truly runner specific options are in XYZRunnerOptions classes.
But due to not having a namespace, names there have to be globally unique
which can be addressed by introducing class name as a namespace.


>
>>
>> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
>> have FlinkPipelineOptions etc...
>>
>> *From: *Chamikara Jayalath <ch...@google.com>
>> *Date: *Tue, 7 May 2019 at 05:29
>> *To: *dev
>>
>>
>>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:
>>>
>>>> There were also discussions[1] in the past about scoping
>>>> PipelineOptions to specific PTransforms. Would scoping PipelineOptions to
>>>> PTransforms make this a more general solution?
>>>>
>>>> 1:
>>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>>
>>>
>>> Is this just for pipeline construction time or also for runtime ? Trying
>>> to scope options for transforms at runtime might complicate things in the
>>> presence of optimizations such as fusion.
>>>
>>>
>>>>
>>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:
>>>>
>>>>> Having namespaces for option makes sense.
>>>>> I think, along with a help command to print all the options given the
>>>>> runner name will be useful.
>>>>> As for the scope of name spacing, I think that assigning a logical
>>>>> name space gives more flexibility around how and where we declare options.
>>>>> It also make future refactoring possible.
>>>>>
>>>>>
>>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Good points. As already mentioned there is no namespacing between the
>>>>>> different pipeline option classes. In particular, there is no
>>>>>> separate
>>>>>> namespace for system and user options which is most concerning.
>>>>>>
>>>>>> I'm in favor of an optional namespace using the class name of the
>>>>>> defining pipeline option class. That way we would at least be able to
>>>>>> resolve duplicate option names. For example, if there were was
>>>>>> "optionX"
>>>>>> in class A and B, we could use "A#optionX" to refer to it from class
>>>>>> A.
>>>>>>
>>>>>
>>> I think this solves the original problem. Runner specific options will
>>> have unique names that includes the runner (in options class). I guess to
>>> be complete we also have to include the package (module for Python) ?
>>> If an option is globally unique, users should be able to specify it
>>> without qualifying (at least for backwards compatibility).
>>>
>>>
>>>>
>>>>>> -Max
>>>>>>
>>>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>>>> runners.
>>>>>> >
>>>>>> > Perhaps for parameters that could have multiple runner relevance,
>>>>>> the
>>>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>>>> > example change the help information to start with a generic
>>>>>> reference "
>>>>>> > worker type on the runner" followed by runner specific behavior
>>>>>> expected
>>>>>> > for RunnerA, RunnerB etc...
>>>>>> >
>>>>>> > But I do worry that without prefix even generic options could cause
>>>>>> > confusion. For example if the use of --network is substantially
>>>>>> > different between runnerA vs runnerB then the user will only have
>>>>>> this
>>>>>> > information by reading the help. It will also mean that a pipeline
>>>>>> which
>>>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>>>> > RunnerB could fail because the format of the options to pass to
>>>>>> > --network are different.
>>>>>> >
>>>>>> > Cheers
>>>>>> >
>>>>>> > Reza
>>>>>> >
>>>>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>>>> > *To: *dev
>>>>>> >
>>>>>> >     Even though they are in classes named for specific runners,
>>>>>> they are
>>>>>> >     not namespaced. All PipelineOptions exist in a global namespace
>>>>>> so
>>>>>> >     they need to be careful to be very precise.
>>>>>> >
>>>>>> >     It is a good point that even though they may be multiple uses
>>>>>> for
>>>>>> >     "machine type" they are probably not going to both happen at the
>>>>>> >     same time.
>>>>>> >
>>>>>> >     If it becomes an issue, another thing we could do would be to
>>>>>> add
>>>>>> >     namespacing support so options have less spooky action, or at
>>>>>> least
>>>>>> >     have a way to resolve it when it happens on accident.
>>>>>> >
>>>>>> >     Kenn
>>>>>> >
>>>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >         Also, we do have runner specific options classes where truly
>>>>>> >         runner specific options can go.
>>>>>> >
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>>>> >
>>>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <
>>>>>> altay@google.com
>>>>>> >         <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >             I agree, that is a good point.
>>>>>> >
>>>>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>>>>> lcwik@google.com>>
>>>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>>>> >             *To: *dev
>>>>>> >
>>>>>> >                 The concept of a machine type isn't necessarily
>>>>>> limited
>>>>>> >                 to Dataflow. If it made sense for a runner, they
>>>>>> could
>>>>>> >                 use AWS/Azure machine types as well.
>>>>>> >
>>>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>>>> >                 <altay@google.com <ma...@google.com>> wrote:
>>>>>> >
>>>>>> >                     This idea was discussed in a PR a few months
>>>>>> ago,
>>>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>>>> makes
>>>>>> >                     sense to use a namespace prefix. The primary
>>>>>> issue
>>>>>> >                     here is that, such a change will very likely be
>>>>>> a
>>>>>> >                     backward incompatible change and would be hard
>>>>>> to do
>>>>>> >                     before the next major version.
>>>>>> >
>>>>>> >                     [1]
>>>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>>>> >
>>>>>> >                     *From: *Reza Rokni <rez@google.com
>>>>>> >                     <ma...@google.com>>
>>>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>>> >                     *To: * <dev@beam.apache.org
>>>>>> >                     <ma...@beam.apache.org>>
>>>>>> >
>>>>>> >                         Hi,
>>>>>> >
>>>>>> >                         Was reading this SO question:
>>>>>> >
>>>>>> >
>>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>> >
>>>>>> >                         And noticed that in
>>>>>> >
>>>>>> >
>>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>> >
>>>>>> >                         The option is called --worker_machine_type.
>>>>>> >
>>>>>> >                         I wonder if runner specific options should
>>>>>> have
>>>>>> >                         the runner in the prefix? Something like
>>>>>> >                         --dataflow_worker_machine_type?
>>>>>> >
>>>>>> >                         Cheers
>>>>>> >                         Reza
>>>>>> >
>>>>>> >                         --
>>>>>> >
>>>>>> >                         This email may be confidential and
>>>>>> privileged.
>>>>>> >                         If you received this communication by
>>>>>> mistake,
>>>>>> >                         please don't forward it to anyone else,
>>>>>> please
>>>>>> >                         erase all copies and attachments, and
>>>>>> please let
>>>>>> >                         me know that it has gone to the wrong
>>>>>> person.
>>>>>> >
>>>>>> >                         The above terms reflect a potential business
>>>>>> >                         arrangement, are provided solely as a basis
>>>>>> for
>>>>>> >                         further discussion, and are not intended to
>>>>>> be
>>>>>> >                         and do not constitute a legally binding
>>>>>> >                         obligation. No legally binding obligations
>>>>>> will
>>>>>> >                         be created, implied, or inferred until an
>>>>>> >                         agreement in final form is executed in
>>>>>> writing
>>>>>> >                         by all parties involved.
>>>>>> >
>>>>>> >
>>>>>> >
>>>>>> > --
>>>>>> >
>>>>>> > This email may be confidential and privileged. If you received this
>>>>>> > communication by mistake, please don't forward it to anyone else,
>>>>>> please
>>>>>> > erase all copies and attachments, and please let me know that it
>>>>>> has
>>>>>> > gone to the wrong person.
>>>>>> >
>>>>>> > The above terms reflect a potential business arrangement, are
>>>>>> provided
>>>>>> > solely as a basis for further discussion, and are not intended to
>>>>>> be and
>>>>>> > do not constitute a legally binding obligation. No legally binding
>>>>>> > obligations will be created, implied, or inferred until an
>>>>>> agreement in
>>>>>> > final form is executed in writing by all parties involved.
>>>>>> >
>>>>>>
>>>>>
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Re: Better naming for runner specific options

Posted by Ahmet Altay <al...@google.com>.
There is RunnerOptions already. Its options are populated by querying the
job service. Any portable runner is able to provide a list of options that
is runner specific through that mechanism.

*From: *Reza Rokni <re...@google.com>
*Date: *Mon, May 6, 2019 at 2:57 PM
*To: * <de...@beam.apache.org>

So the options here would be moved to runner options?
>
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> In Java they are in DataflowPipelineWorkerPoolOptions and of course we
> have FlinkPipelineOptions etc...
>
> *From: *Chamikara Jayalath <ch...@google.com>
> *Date: *Tue, 7 May 2019 at 05:29
> *To: *dev
>
>
>> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:
>>
>>> There were also discussions[1] in the past about scoping PipelineOptions
>>> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
>>> this a more general solution?
>>>
>>> 1:
>>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>>
>>
>> Is this just for pipeline construction time or also for runtime ? Trying
>> to scope options for transforms at runtime might complicate things in the
>> presence of optimizations such as fusion.
>>
>>
>>>
>>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:
>>>
>>>> Having namespaces for option makes sense.
>>>> I think, along with a help command to print all the options given the
>>>> runner name will be useful.
>>>> As for the scope of name spacing, I think that assigning a logical name
>>>> space gives more flexibility around how and where we declare options. It
>>>> also make future refactoring possible.
>>>>
>>>>
>>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org>
>>>> wrote:
>>>>
>>>>> Good points. As already mentioned there is no namespacing between the
>>>>> different pipeline option classes. In particular, there is no separate
>>>>> namespace for system and user options which is most concerning.
>>>>>
>>>>> I'm in favor of an optional namespace using the class name of the
>>>>> defining pipeline option class. That way we would at least be able to
>>>>> resolve duplicate option names. For example, if there were was
>>>>> "optionX"
>>>>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>>>>
>>>>
>> I think this solves the original problem. Runner specific options will
>> have unique names that includes the runner (in options class). I guess to
>> be complete we also have to include the package (module for Python) ?
>> If an option is globally unique, users should be able to specify it
>> without qualifying (at least for backwards compatibility).
>>
>>
>>>
>>>>> -Max
>>>>>
>>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>>> runners.
>>>>> >
>>>>> > Perhaps for parameters that could have multiple runner relevance,
>>>>> the
>>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>>> > example change the help information to start with a generic
>>>>> reference "
>>>>> > worker type on the runner" followed by runner specific behavior
>>>>> expected
>>>>> > for RunnerA, RunnerB etc...
>>>>> >
>>>>> > But I do worry that without prefix even generic options could cause
>>>>> > confusion. For example if the use of --network is substantially
>>>>> > different between runnerA vs runnerB then the user will only have
>>>>> this
>>>>> > information by reading the help. It will also mean that a pipeline
>>>>> which
>>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>>> > RunnerB could fail because the format of the options to pass to
>>>>> > --network are different.
>>>>> >
>>>>> > Cheers
>>>>> >
>>>>> > Reza
>>>>> >
>>>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>>> > *To: *dev
>>>>> >
>>>>> >     Even though they are in classes named for specific runners, they
>>>>> are
>>>>> >     not namespaced. All PipelineOptions exist in a global namespace
>>>>> so
>>>>> >     they need to be careful to be very precise.
>>>>> >
>>>>> >     It is a good point that even though they may be multiple uses for
>>>>> >     "machine type" they are probably not going to both happen at the
>>>>> >     same time.
>>>>> >
>>>>> >     If it becomes an issue, another thing we could do would be to add
>>>>> >     namespacing support so options have less spooky action, or at
>>>>> least
>>>>> >     have a way to resolve it when it happens on accident.
>>>>> >
>>>>> >     Kenn
>>>>> >
>>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>>>> >
>>>>> >         Also, we do have runner specific options classes where truly
>>>>> >         runner specific options can go.
>>>>> >
>>>>> >
>>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>>> >
>>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>>> >
>>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
>>>>> >         <ma...@google.com>> wrote:
>>>>> >
>>>>> >             I agree, that is a good point.
>>>>> >
>>>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>>>> lcwik@google.com>>
>>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>>> >             *To: *dev
>>>>> >
>>>>> >                 The concept of a machine type isn't necessarily
>>>>> limited
>>>>> >                 to Dataflow. If it made sense for a runner, they
>>>>> could
>>>>> >                 use AWS/Azure machine types as well.
>>>>> >
>>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>>> >                 <altay@google.com <ma...@google.com>> wrote:
>>>>> >
>>>>> >                     This idea was discussed in a PR a few months ago,
>>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>>> makes
>>>>> >                     sense to use a namespace prefix. The primary
>>>>> issue
>>>>> >                     here is that, such a change will very likely be a
>>>>> >                     backward incompatible change and would be hard
>>>>> to do
>>>>> >                     before the next major version.
>>>>> >
>>>>> >                     [1]
>>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>>> >
>>>>> >                     *From: *Reza Rokni <rez@google.com
>>>>> >                     <ma...@google.com>>
>>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>> >                     *To: * <dev@beam.apache.org
>>>>> >                     <ma...@beam.apache.org>>
>>>>> >
>>>>> >                         Hi,
>>>>> >
>>>>> >                         Was reading this SO question:
>>>>> >
>>>>> >
>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>> >
>>>>> >                         And noticed that in
>>>>> >
>>>>> >
>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>> >
>>>>> >                         The option is called --worker_machine_type.
>>>>> >
>>>>> >                         I wonder if runner specific options should
>>>>> have
>>>>> >                         the runner in the prefix? Something like
>>>>> >                         --dataflow_worker_machine_type?
>>>>> >
>>>>> >                         Cheers
>>>>> >                         Reza
>>>>> >
>>>>> >                         --
>>>>> >
>>>>> >                         This email may be confidential and
>>>>> privileged.
>>>>> >                         If you received this communication by
>>>>> mistake,
>>>>> >                         please don't forward it to anyone else,
>>>>> please
>>>>> >                         erase all copies and attachments, and please
>>>>> let
>>>>> >                         me know that it has gone to the wrong person.
>>>>> >
>>>>> >                         The above terms reflect a potential business
>>>>> >                         arrangement, are provided solely as a basis
>>>>> for
>>>>> >                         further discussion, and are not intended to
>>>>> be
>>>>> >                         and do not constitute a legally binding
>>>>> >                         obligation. No legally binding obligations
>>>>> will
>>>>> >                         be created, implied, or inferred until an
>>>>> >                         agreement in final form is executed in
>>>>> writing
>>>>> >                         by all parties involved.
>>>>> >
>>>>> >
>>>>> >
>>>>> > --
>>>>> >
>>>>> > This email may be confidential and privileged. If you received this
>>>>> > communication by mistake, please don't forward it to anyone else,
>>>>> please
>>>>> > erase all copies and attachments, and please let me know that it has
>>>>> > gone to the wrong person.
>>>>> >
>>>>> > The above terms reflect a potential business arrangement, are
>>>>> provided
>>>>> > solely as a basis for further discussion, and are not intended to be
>>>>> and
>>>>> > do not constitute a legally binding obligation. No legally binding
>>>>> > obligations will be created, implied, or inferred until an agreement
>>>>> in
>>>>> > final form is executed in writing by all parties involved.
>>>>> >
>>>>>
>>>>
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>

Re: Better naming for runner specific options

Posted by Reza Rokni <re...@google.com>.
So the options here would be moved to runner options?
https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions

In Java they are in DataflowPipelineWorkerPoolOptions and of course we
have FlinkPipelineOptions etc...

*From: *Chamikara Jayalath <ch...@google.com>
*Date: *Tue, 7 May 2019 at 05:29
*To: *dev


> On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:
>
>> There were also discussions[1] in the past about scoping PipelineOptions
>> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
>> this a more general solution?
>>
>> 1:
>> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>>
>
> Is this just for pipeline construction time or also for runtime ? Trying
> to scope options for transforms at runtime might complicate things in the
> presence of optimizations such as fusion.
>
>
>>
>> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:
>>
>>> Having namespaces for option makes sense.
>>> I think, along with a help command to print all the options given the
>>> runner name will be useful.
>>> As for the scope of name spacing, I think that assigning a logical name
>>> space gives more flexibility around how and where we declare options. It
>>> also make future refactoring possible.
>>>
>>>
>>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org>
>>> wrote:
>>>
>>>> Good points. As already mentioned there is no namespacing between the
>>>> different pipeline option classes. In particular, there is no separate
>>>> namespace for system and user options which is most concerning.
>>>>
>>>> I'm in favor of an optional namespace using the class name of the
>>>> defining pipeline option class. That way we would at least be able to
>>>> resolve duplicate option names. For example, if there were was
>>>> "optionX"
>>>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>>>
>>>
> I think this solves the original problem. Runner specific options will
> have unique names that includes the runner (in options class). I guess to
> be complete we also have to include the package (module for Python) ?
> If an option is globally unique, users should be able to specify it
> without qualifying (at least for backwards compatibility).
>
>
>>
>>>> -Max
>>>>
>>>> On 04.05.19 02:23, Reza Rokni wrote:
>>>> > Great point Lukasz, worker machine could be relevant to multiple
>>>> runners.
>>>> >
>>>> > Perhaps for parameters that could have multiple runner relevance, the
>>>> > doc could be rephrased to reflect its potential multiple uses. For
>>>> > example change the help information to start with a generic reference
>>>> "
>>>> > worker type on the runner" followed by runner specific behavior
>>>> expected
>>>> > for RunnerA, RunnerB etc...
>>>> >
>>>> > But I do worry that without prefix even generic options could cause
>>>> > confusion. For example if the use of --network is substantially
>>>> > different between runnerA vs runnerB then the user will only have
>>>> this
>>>> > information by reading the help. It will also mean that a pipeline
>>>> which
>>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>>> > RunnerB could fail because the format of the options to pass to
>>>> > --network are different.
>>>> >
>>>> > Cheers
>>>> >
>>>> > Reza
>>>> >
>>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>>> > *Date: *Sat, 4 May 2019 at 03:54
>>>> > *To: *dev
>>>> >
>>>> >     Even though they are in classes named for specific runners, they
>>>> are
>>>> >     not namespaced. All PipelineOptions exist in a global namespace so
>>>> >     they need to be careful to be very precise.
>>>> >
>>>> >     It is a good point that even though they may be multiple uses for
>>>> >     "machine type" they are probably not going to both happen at the
>>>> >     same time.
>>>> >
>>>> >     If it becomes an issue, another thing we could do would be to add
>>>> >     namespacing support so options have less spooky action, or at
>>>> least
>>>> >     have a way to resolve it when it happens on accident.
>>>> >
>>>> >     Kenn
>>>> >
>>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>>> >
>>>> >         Also, we do have runner specific options classes where truly
>>>> >         runner specific options can go.
>>>> >
>>>> >
>>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>>> >
>>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>>> >
>>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
>>>> >         <ma...@google.com>> wrote:
>>>> >
>>>> >             I agree, that is a good point.
>>>> >
>>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>>> lcwik@google.com>>
>>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>>> >             *To: *dev
>>>> >
>>>> >                 The concept of a machine type isn't necessarily
>>>> limited
>>>> >                 to Dataflow. If it made sense for a runner, they could
>>>> >                 use AWS/Azure machine types as well.
>>>> >
>>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>>> >                 <altay@google.com <ma...@google.com>> wrote:
>>>> >
>>>> >                     This idea was discussed in a PR a few months ago,
>>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>>> makes
>>>> >                     sense to use a namespace prefix. The primary issue
>>>> >                     here is that, such a change will very likely be a
>>>> >                     backward incompatible change and would be hard to
>>>> do
>>>> >                     before the next major version.
>>>> >
>>>> >                     [1]
>>>> https://issues.apache.org/jira/browse/BEAM-6531
>>>> >
>>>> >                     *From: *Reza Rokni <rez@google.com
>>>> >                     <ma...@google.com>>
>>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>>> >                     *To: * <dev@beam.apache.org
>>>> >                     <ma...@beam.apache.org>>
>>>> >
>>>> >                         Hi,
>>>> >
>>>> >                         Was reading this SO question:
>>>> >
>>>> >
>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>> >
>>>> >                         And noticed that in
>>>> >
>>>> >
>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>> >
>>>> >                         The option is called --worker_machine_type.
>>>> >
>>>> >                         I wonder if runner specific options should
>>>> have
>>>> >                         the runner in the prefix? Something like
>>>> >                         --dataflow_worker_machine_type?
>>>> >
>>>> >                         Cheers
>>>> >                         Reza
>>>> >
>>>> >                         --
>>>> >
>>>> >                         This email may be confidential and privileged.
>>>> >                         If you received this communication by mistake,
>>>> >                         please don't forward it to anyone else, please
>>>> >                         erase all copies and attachments, and please
>>>> let
>>>> >                         me know that it has gone to the wrong person.
>>>> >
>>>> >                         The above terms reflect a potential business
>>>> >                         arrangement, are provided solely as a basis
>>>> for
>>>> >                         further discussion, and are not intended to be
>>>> >                         and do not constitute a legally binding
>>>> >                         obligation. No legally binding obligations
>>>> will
>>>> >                         be created, implied, or inferred until an
>>>> >                         agreement in final form is executed in writing
>>>> >                         by all parties involved.
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> >
>>>> > This email may be confidential and privileged. If you received this
>>>> > communication by mistake, please don't forward it to anyone else,
>>>> please
>>>> > erase all copies and attachments, and please let me know that it has
>>>> > gone to the wrong person.
>>>> >
>>>> > The above terms reflect a potential business arrangement, are
>>>> provided
>>>> > solely as a basis for further discussion, and are not intended to be
>>>> and
>>>> > do not constitute a legally binding obligation. No legally binding
>>>> > obligations will be created, implied, or inferred until an agreement
>>>> in
>>>> > final form is executed in writing by all parties involved.
>>>> >
>>>>
>>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Better naming for runner specific options

Posted by Chamikara Jayalath <ch...@google.com>.
On Mon, May 6, 2019 at 2:13 PM Lukasz Cwik <lc...@google.com> wrote:

> There were also discussions[1] in the past about scoping PipelineOptions
> to specific PTransforms. Would scoping PipelineOptions to PTransforms make
> this a more general solution?
>
> 1:
> https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E
>

Is this just for pipeline construction time or also for runtime ? Trying to
scope options for transforms at runtime might complicate things in the
presence of optimizations such as fusion.


>
> On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:
>
>> Having namespaces for option makes sense.
>> I think, along with a help command to print all the options given the
>> runner name will be useful.
>> As for the scope of name spacing, I think that assigning a logical name
>> space gives more flexibility around how and where we declare options. It
>> also make future refactoring possible.
>>
>>
>> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org> wrote:
>>
>>> Good points. As already mentioned there is no namespacing between the
>>> different pipeline option classes. In particular, there is no separate
>>> namespace for system and user options which is most concerning.
>>>
>>> I'm in favor of an optional namespace using the class name of the
>>> defining pipeline option class. That way we would at least be able to
>>> resolve duplicate option names. For example, if there were was "optionX"
>>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>>
>>
I think this solves the original problem. Runner specific options will have
unique names that includes the runner (in options class). I guess to be
complete we also have to include the package (module for Python) ?
If an option is globally unique, users should be able to specify it without
qualifying (at least for backwards compatibility).


>
>>> -Max
>>>
>>> On 04.05.19 02:23, Reza Rokni wrote:
>>> > Great point Lukasz, worker machine could be relevant to multiple
>>> runners.
>>> >
>>> > Perhaps for parameters that could have multiple runner relevance, the
>>> > doc could be rephrased to reflect its potential multiple uses. For
>>> > example change the help information to start with a generic reference
>>> "
>>> > worker type on the runner" followed by runner specific behavior
>>> expected
>>> > for RunnerA, RunnerB etc...
>>> >
>>> > But I do worry that without prefix even generic options could cause
>>> > confusion. For example if the use of --network is substantially
>>> > different between runnerA vs runnerB then the user will only have this
>>> > information by reading the help. It will also mean that a pipeline
>>> which
>>> > is expected to work both on-premise on RunnerA and in the cloud on
>>> > RunnerB could fail because the format of the options to pass to
>>> > --network are different.
>>> >
>>> > Cheers
>>> >
>>> > Reza
>>> >
>>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>>> > *Date: *Sat, 4 May 2019 at 03:54
>>> > *To: *dev
>>> >
>>> >     Even though they are in classes named for specific runners, they
>>> are
>>> >     not namespaced. All PipelineOptions exist in a global namespace so
>>> >     they need to be careful to be very precise.
>>> >
>>> >     It is a good point that even though they may be multiple uses for
>>> >     "machine type" they are probably not going to both happen at the
>>> >     same time.
>>> >
>>> >     If it becomes an issue, another thing we could do would be to add
>>> >     namespacing support so options have less spooky action, or at least
>>> >     have a way to resolve it when it happens on accident.
>>> >
>>> >     Kenn
>>> >
>>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>>> >     <chamikara@google.com <ma...@google.com>> wrote:
>>> >
>>> >         Also, we do have runner specific options classes where truly
>>> >         runner specific options can go.
>>> >
>>> >
>>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>> >
>>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>> >
>>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
>>> >         <ma...@google.com>> wrote:
>>> >
>>> >             I agree, that is a good point.
>>> >
>>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>>> lcwik@google.com>>
>>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>>> >             *To: *dev
>>> >
>>> >                 The concept of a machine type isn't necessarily limited
>>> >                 to Dataflow. If it made sense for a runner, they could
>>> >                 use AWS/Azure machine types as well.
>>> >
>>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>>> >                 <altay@google.com <ma...@google.com>> wrote:
>>> >
>>> >                     This idea was discussed in a PR a few months ago,
>>> >                     and JIRA was filed as a follow up [1]. IMO, it
>>> makes
>>> >                     sense to use a namespace prefix. The primary issue
>>> >                     here is that, such a change will very likely be a
>>> >                     backward incompatible change and would be hard to
>>> do
>>> >                     before the next major version.
>>> >
>>> >                     [1]
>>> https://issues.apache.org/jira/browse/BEAM-6531
>>> >
>>> >                     *From: *Reza Rokni <rez@google.com
>>> >                     <ma...@google.com>>
>>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>>> >                     *To: * <dev@beam.apache.org
>>> >                     <ma...@beam.apache.org>>
>>> >
>>> >                         Hi,
>>> >
>>> >                         Was reading this SO question:
>>> >
>>> >
>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>> >
>>> >                         And noticed that in
>>> >
>>> >
>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>> >
>>> >                         The option is called --worker_machine_type.
>>> >
>>> >                         I wonder if runner specific options should have
>>> >                         the runner in the prefix? Something like
>>> >                         --dataflow_worker_machine_type?
>>> >
>>> >                         Cheers
>>> >                         Reza
>>> >
>>> >                         --
>>> >
>>> >                         This email may be confidential and privileged.
>>> >                         If you received this communication by mistake,
>>> >                         please don't forward it to anyone else, please
>>> >                         erase all copies and attachments, and please
>>> let
>>> >                         me know that it has gone to the wrong person.
>>> >
>>> >                         The above terms reflect a potential business
>>> >                         arrangement, are provided solely as a basis for
>>> >                         further discussion, and are not intended to be
>>> >                         and do not constitute a legally binding
>>> >                         obligation. No legally binding obligations will
>>> >                         be created, implied, or inferred until an
>>> >                         agreement in final form is executed in writing
>>> >                         by all parties involved.
>>> >
>>> >
>>> >
>>> > --
>>> >
>>> > This email may be confidential and privileged. If you received this
>>> > communication by mistake, please don't forward it to anyone else,
>>> please
>>> > erase all copies and attachments, and please let me know that it has
>>> > gone to the wrong person.
>>> >
>>> > The above terms reflect a potential business arrangement, are provided
>>> > solely as a basis for further discussion, and are not intended to be
>>> and
>>> > do not constitute a legally binding obligation. No legally binding
>>> > obligations will be created, implied, or inferred until an agreement
>>> in
>>> > final form is executed in writing by all parties involved.
>>> >
>>>
>>

Re: Better naming for runner specific options

Posted by Lukasz Cwik <lc...@google.com>.
There were also discussions[1] in the past about scoping PipelineOptions to
specific PTransforms. Would scoping PipelineOptions to PTransforms make
this a more general solution?

1:
https://lists.apache.org/thread.html/05f849d39788cb0af840cb9e86ca631586783947eb4e5a1774b647d1@%3Cdev.beam.apache.org%3E

On Mon, May 6, 2019 at 12:02 PM Ankur Goenka <go...@google.com> wrote:

> Having namespaces for option makes sense.
> I think, along with a help command to print all the options given the
> runner name will be useful.
> As for the scope of name spacing, I think that assigning a logical name
> space gives more flexibility around how and where we declare options. It
> also make future refactoring possible.
>
>
> On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org> wrote:
>
>> Good points. As already mentioned there is no namespacing between the
>> different pipeline option classes. In particular, there is no separate
>> namespace for system and user options which is most concerning.
>>
>> I'm in favor of an optional namespace using the class name of the
>> defining pipeline option class. That way we would at least be able to
>> resolve duplicate option names. For example, if there were was "optionX"
>> in class A and B, we could use "A#optionX" to refer to it from class A.
>>
>> -Max
>>
>> On 04.05.19 02:23, Reza Rokni wrote:
>> > Great point Lukasz, worker machine could be relevant to multiple
>> runners.
>> >
>> > Perhaps for parameters that could have multiple runner relevance, the
>> > doc could be rephrased to reflect its potential multiple uses. For
>> > example change the help information to start with a generic reference "
>> > worker type on the runner" followed by runner specific behavior
>> expected
>> > for RunnerA, RunnerB etc...
>> >
>> > But I do worry that without prefix even generic options could cause
>> > confusion. For example if the use of --network is substantially
>> > different between runnerA vs runnerB then the user will only have this
>> > information by reading the help. It will also mean that a pipeline
>> which
>> > is expected to work both on-premise on RunnerA and in the cloud on
>> > RunnerB could fail because the format of the options to pass to
>> > --network are different.
>> >
>> > Cheers
>> >
>> > Reza
>> >
>> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
>> > *Date: *Sat, 4 May 2019 at 03:54
>> > *To: *dev
>> >
>> >     Even though they are in classes named for specific runners, they are
>> >     not namespaced. All PipelineOptions exist in a global namespace so
>> >     they need to be careful to be very precise.
>> >
>> >     It is a good point that even though they may be multiple uses for
>> >     "machine type" they are probably not going to both happen at the
>> >     same time.
>> >
>> >     If it becomes an issue, another thing we could do would be to add
>> >     namespacing support so options have less spooky action, or at least
>> >     have a way to resolve it when it happens on accident.
>> >
>> >     Kenn
>> >
>> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>> >     <chamikara@google.com <ma...@google.com>> wrote:
>> >
>> >         Also, we do have runner specific options classes where truly
>> >         runner specific options can go.
>> >
>> >
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>> >
>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>> >
>> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
>> >         <ma...@google.com>> wrote:
>> >
>> >             I agree, that is a good point.
>> >
>> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
>> lcwik@google.com>>
>> >             *Date: *Fri, May 3, 2019 at 9:37 AM
>> >             *To: *dev
>> >
>> >                 The concept of a machine type isn't necessarily limited
>> >                 to Dataflow. If it made sense for a runner, they could
>> >                 use AWS/Azure machine types as well.
>> >
>> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>> >                 <altay@google.com <ma...@google.com>> wrote:
>> >
>> >                     This idea was discussed in a PR a few months ago,
>> >                     and JIRA was filed as a follow up [1]. IMO, it makes
>> >                     sense to use a namespace prefix. The primary issue
>> >                     here is that, such a change will very likely be a
>> >                     backward incompatible change and would be hard to do
>> >                     before the next major version.
>> >
>> >                     [1] https://issues.apache.org/jira/browse/BEAM-6531
>> >
>> >                     *From: *Reza Rokni <rez@google.com
>> >                     <ma...@google.com>>
>> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
>> >                     *To: * <dev@beam.apache.org
>> >                     <ma...@beam.apache.org>>
>> >
>> >                         Hi,
>> >
>> >                         Was reading this SO question:
>> >
>> >
>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>> >
>> >                         And noticed that in
>> >
>> >
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>> >
>> >                         The option is called --worker_machine_type.
>> >
>> >                         I wonder if runner specific options should have
>> >                         the runner in the prefix? Something like
>> >                         --dataflow_worker_machine_type?
>> >
>> >                         Cheers
>> >                         Reza
>> >
>> >                         --
>> >
>> >                         This email may be confidential and privileged.
>> >                         If you received this communication by mistake,
>> >                         please don't forward it to anyone else, please
>> >                         erase all copies and attachments, and please let
>> >                         me know that it has gone to the wrong person.
>> >
>> >                         The above terms reflect a potential business
>> >                         arrangement, are provided solely as a basis for
>> >                         further discussion, and are not intended to be
>> >                         and do not constitute a legally binding
>> >                         obligation. No legally binding obligations will
>> >                         be created, implied, or inferred until an
>> >                         agreement in final form is executed in writing
>> >                         by all parties involved.
>> >
>> >
>> >
>> > --
>> >
>> > This email may be confidential and privileged. If you received this
>> > communication by mistake, please don't forward it to anyone else,
>> please
>> > erase all copies and attachments, and please let me know that it has
>> > gone to the wrong person.
>> >
>> > The above terms reflect a potential business arrangement, are provided
>> > solely as a basis for further discussion, and are not intended to be
>> and
>> > do not constitute a legally binding obligation. No legally binding
>> > obligations will be created, implied, or inferred until an agreement in
>> > final form is executed in writing by all parties involved.
>> >
>>
>

Re: Better naming for runner specific options

Posted by Ankur Goenka <go...@google.com>.
Having namespaces for option makes sense.
I think, along with a help command to print all the options given the
runner name will be useful.
As for the scope of name spacing, I think that assigning a logical name
space gives more flexibility around how and where we declare options. It
also make future refactoring possible.


On Mon, May 6, 2019 at 7:50 AM Maximilian Michels <mx...@apache.org> wrote:

> Good points. As already mentioned there is no namespacing between the
> different pipeline option classes. In particular, there is no separate
> namespace for system and user options which is most concerning.
>
> I'm in favor of an optional namespace using the class name of the
> defining pipeline option class. That way we would at least be able to
> resolve duplicate option names. For example, if there were was "optionX"
> in class A and B, we could use "A#optionX" to refer to it from class A.
>
> -Max
>
> On 04.05.19 02:23, Reza Rokni wrote:
> > Great point Lukasz, worker machine could be relevant to multiple runners.
> >
> > Perhaps for parameters that could have multiple runner relevance, the
> > doc could be rephrased to reflect its potential multiple uses. For
> > example change the help information to start with a generic reference "
> > worker type on the runner" followed by runner specific behavior expected
> > for RunnerA, RunnerB etc...
> >
> > But I do worry that without prefix even generic options could cause
> > confusion. For example if the use of --network is substantially
> > different between runnerA vs runnerB then the user will only have this
> > information by reading the help. It will also mean that a pipeline which
> > is expected to work both on-premise on RunnerA and in the cloud on
> > RunnerB could fail because the format of the options to pass to
> > --network are different.
> >
> > Cheers
> >
> > Reza
> >
> > *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
> > *Date: *Sat, 4 May 2019 at 03:54
> > *To: *dev
> >
> >     Even though they are in classes named for specific runners, they are
> >     not namespaced. All PipelineOptions exist in a global namespace so
> >     they need to be careful to be very precise.
> >
> >     It is a good point that even though they may be multiple uses for
> >     "machine type" they are probably not going to both happen at the
> >     same time.
> >
> >     If it becomes an issue, another thing we could do would be to add
> >     namespacing support so options have less spooky action, or at least
> >     have a way to resolve it when it happens on accident.
> >
> >     Kenn
> >
> >     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
> >     <chamikara@google.com <ma...@google.com>> wrote:
> >
> >         Also, we do have runner specific options classes where truly
> >         runner specific options can go.
> >
> >
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
> >
> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> >
> >         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
> >         <ma...@google.com>> wrote:
> >
> >             I agree, that is a good point.
> >
> >             *From: *Lukasz Cwik <lcwik@google.com <mailto:
> lcwik@google.com>>
> >             *Date: *Fri, May 3, 2019 at 9:37 AM
> >             *To: *dev
> >
> >                 The concept of a machine type isn't necessarily limited
> >                 to Dataflow. If it made sense for a runner, they could
> >                 use AWS/Azure machine types as well.
> >
> >                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
> >                 <altay@google.com <ma...@google.com>> wrote:
> >
> >                     This idea was discussed in a PR a few months ago,
> >                     and JIRA was filed as a follow up [1]. IMO, it makes
> >                     sense to use a namespace prefix. The primary issue
> >                     here is that, such a change will very likely be a
> >                     backward incompatible change and would be hard to do
> >                     before the next major version.
> >
> >                     [1] https://issues.apache.org/jira/browse/BEAM-6531
> >
> >                     *From: *Reza Rokni <rez@google.com
> >                     <ma...@google.com>>
> >                     *Date: *Thu, May 2, 2019 at 8:00 PM
> >                     *To: * <dev@beam.apache.org
> >                     <ma...@beam.apache.org>>
> >
> >                         Hi,
> >
> >                         Was reading this SO question:
> >
> >
> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
> >
> >                         And noticed that in
> >
> >
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
> >
> >                         The option is called --worker_machine_type.
> >
> >                         I wonder if runner specific options should have
> >                         the runner in the prefix? Something like
> >                         --dataflow_worker_machine_type?
> >
> >                         Cheers
> >                         Reza
> >
> >                         --
> >
> >                         This email may be confidential and privileged.
> >                         If you received this communication by mistake,
> >                         please don't forward it to anyone else, please
> >                         erase all copies and attachments, and please let
> >                         me know that it has gone to the wrong person.
> >
> >                         The above terms reflect a potential business
> >                         arrangement, are provided solely as a basis for
> >                         further discussion, and are not intended to be
> >                         and do not constitute a legally binding
> >                         obligation. No legally binding obligations will
> >                         be created, implied, or inferred until an
> >                         agreement in final form is executed in writing
> >                         by all parties involved.
> >
> >
> >
> > --
> >
> > This email may be confidential and privileged. If you received this
> > communication by mistake, please don't forward it to anyone else, please
> > erase all copies and attachments, and please let me know that it has
> > gone to the wrong person.
> >
> > The above terms reflect a potential business arrangement, are provided
> > solely as a basis for further discussion, and are not intended to be and
> > do not constitute a legally binding obligation. No legally binding
> > obligations will be created, implied, or inferred until an agreement in
> > final form is executed in writing by all parties involved.
> >
>

Re: Better naming for runner specific options

Posted by Maximilian Michels <mx...@apache.org>.
Good points. As already mentioned there is no namespacing between the 
different pipeline option classes. In particular, there is no separate 
namespace for system and user options which is most concerning.

I'm in favor of an optional namespace using the class name of the 
defining pipeline option class. That way we would at least be able to 
resolve duplicate option names. For example, if there were was "optionX" 
in class A and B, we could use "A#optionX" to refer to it from class A.

-Max

On 04.05.19 02:23, Reza Rokni wrote:
> Great point Lukasz, worker machine could be relevant to multiple runners.
> 
> Perhaps for parameters that could have multiple runner relevance, the 
> doc could be rephrased to reflect its potential multiple uses. For 
> example change the help information to start with a generic reference " 
> worker type on the runner" followed by runner specific behavior expected 
> for RunnerA, RunnerB etc...
> 
> But I do worry that without prefix even generic options could cause 
> confusion. For example if the use of --network is substantially 
> different between runnerA vs runnerB then the user will only have this 
> information by reading the help. It will also mean that a pipeline which 
> is expected to work both on-premise on RunnerA and in the cloud on 
> RunnerB could fail because the format of the options to pass to 
> --network are different.
> 
> Cheers
> 
> Reza
> 
> *From: *Kenneth Knowles <kenn@apache.org <ma...@apache.org>>
> *Date: *Sat, 4 May 2019 at 03:54
> *To: *dev
> 
>     Even though they are in classes named for specific runners, they are
>     not namespaced. All PipelineOptions exist in a global namespace so
>     they need to be careful to be very precise.
> 
>     It is a good point that even though they may be multiple uses for
>     "machine type" they are probably not going to both happen at the
>     same time.
> 
>     If it becomes an issue, another thing we could do would be to add
>     namespacing support so options have less spooky action, or at least
>     have a way to resolve it when it happens on accident.
> 
>     Kenn
> 
>     On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath
>     <chamikara@google.com <ma...@google.com>> wrote:
> 
>         Also, we do have runner specific options classes where truly
>         runner specific options can go.
> 
>         https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>         https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
> 
>         On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <altay@google.com
>         <ma...@google.com>> wrote:
> 
>             I agree, that is a good point.
> 
>             *From: *Lukasz Cwik <lcwik@google.com <ma...@google.com>>
>             *Date: *Fri, May 3, 2019 at 9:37 AM
>             *To: *dev
> 
>                 The concept of a machine type isn't necessarily limited
>                 to Dataflow. If it made sense for a runner, they could
>                 use AWS/Azure machine types as well.
> 
>                 On Fri, May 3, 2019 at 9:32 AM Ahmet Altay
>                 <altay@google.com <ma...@google.com>> wrote:
> 
>                     This idea was discussed in a PR a few months ago,
>                     and JIRA was filed as a follow up [1]. IMO, it makes
>                     sense to use a namespace prefix. The primary issue
>                     here is that, such a change will very likely be a
>                     backward incompatible change and would be hard to do
>                     before the next major version.
> 
>                     [1] https://issues.apache.org/jira/browse/BEAM-6531
> 
>                     *From: *Reza Rokni <rez@google.com
>                     <ma...@google.com>>
>                     *Date: *Thu, May 2, 2019 at 8:00 PM
>                     *To: * <dev@beam.apache.org
>                     <ma...@beam.apache.org>>
> 
>                         Hi,
> 
>                         Was reading this SO question:
> 
>                         https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
> 
>                         And noticed that in
> 
>                         https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
> 
>                         The option is called --worker_machine_type.
> 
>                         I wonder if runner specific options should have
>                         the runner in the prefix? Something like
>                         --dataflow_worker_machine_type?
> 
>                         Cheers
>                         Reza
> 
>                         -- 
> 
>                         This email may be confidential and privileged.
>                         If you received this communication by mistake,
>                         please don't forward it to anyone else, please
>                         erase all copies and attachments, and please let
>                         me know that it has gone to the wrong person.
> 
>                         The above terms reflect a potential business
>                         arrangement, are provided solely as a basis for
>                         further discussion, and are not intended to be
>                         and do not constitute a legally binding
>                         obligation. No legally binding obligations will
>                         be created, implied, or inferred until an
>                         agreement in final form is executed in writing
>                         by all parties involved.
> 
> 
> 
> -- 
> 
> This email may be confidential and privileged. If you received this 
> communication by mistake, please don't forward it to anyone else, please 
> erase all copies and attachments, and please let me know that it has 
> gone to the wrong person.
> 
> The above terms reflect a potential business arrangement, are provided 
> solely as a basis for further discussion, and are not intended to be and 
> do not constitute a legally binding obligation. No legally binding 
> obligations will be created, implied, or inferred until an agreement in 
> final form is executed in writing by all parties involved.
> 

Re: Better naming for runner specific options

Posted by Reza Rokni <re...@google.com>.
Great point Lukasz, worker machine could be relevant to multiple runners.

Perhaps for parameters that could have multiple runner relevance, the doc
could be rephrased to reflect its potential multiple uses. For example
change the help information to start with a generic reference " worker type
on the runner" followed by runner specific behavior expected for RunnerA,
RunnerB etc...

But I do worry that without prefix even generic options could cause
confusion. For example if the use of --network is substantially different
between runnerA vs runnerB then the user will only have this information by
reading the help. It will also mean that a pipeline which is expected to
work both on-premise on RunnerA and in the cloud on RunnerB could fail
because the format of the options to pass to --network are different.

Cheers

Reza

*From: *Kenneth Knowles <ke...@apache.org>
*Date: *Sat, 4 May 2019 at 03:54
*To: *dev

Even though they are in classes named for specific runners, they are not
> namespaced. All PipelineOptions exist in a global namespace so they need to
> be careful to be very precise.
>
> It is a good point that even though they may be multiple uses for "machine
> type" they are probably not going to both happen at the same time.
>
> If it becomes an issue, another thing we could do would be to add
> namespacing support so options have less spooky action, or at least have a
> way to resolve it when it happens on accident.
>
> Kenn
>
> On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Also, we do have runner specific options classes where truly runner
>> specific options can go.
>>
>>
>> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>>
>> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>>
>> On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> I agree, that is a good point.
>>>
>>> *From: *Lukasz Cwik <lc...@google.com>
>>> *Date: *Fri, May 3, 2019 at 9:37 AM
>>> *To: *dev
>>>
>>> The concept of a machine type isn't necessarily limited to Dataflow. If
>>>> it made sense for a runner, they could use AWS/Azure machine types as well.
>>>>
>>>> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay <al...@google.com> wrote:
>>>>
>>>>> This idea was discussed in a PR a few months ago, and JIRA was filed
>>>>> as a follow up [1]. IMO, it makes sense to use a namespace prefix. The
>>>>> primary issue here is that, such a change will very likely be a backward
>>>>> incompatible change and would be hard to do before the next major version.
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>>>>
>>>>> *From: *Reza Rokni <re...@google.com>
>>>>> *Date: *Thu, May 2, 2019 at 8:00 PM
>>>>> *To: * <de...@beam.apache.org>
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> Was reading this SO question:
>>>>>>
>>>>>>
>>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>>
>>>>>> And noticed that in
>>>>>>
>>>>>>
>>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>>
>>>>>> The option is called --worker_machine_type.
>>>>>>
>>>>>> I wonder if runner specific options should have the runner in the
>>>>>> prefix? Something like --dataflow_worker_machine_type?
>>>>>>
>>>>>> Cheers
>>>>>> Reza
>>>>>>
>>>>>> --
>>>>>>
>>>>>> This email may be confidential and privileged. If you received this
>>>>>> communication by mistake, please don't forward it to anyone else, please
>>>>>> erase all copies and attachments, and please let me know that it has gone
>>>>>> to the wrong person.
>>>>>>
>>>>>> The above terms reflect a potential business arrangement, are
>>>>>> provided solely as a basis for further discussion, and are not intended to
>>>>>> be and do not constitute a legally binding obligation. No legally binding
>>>>>> obligations will be created, implied, or inferred until an agreement in
>>>>>> final form is executed in writing by all parties involved.
>>>>>>
>>>>>

-- 

This email may be confidential and privileged. If you received this
communication by mistake, please don't forward it to anyone else, please
erase all copies and attachments, and please let me know that it has gone
to the wrong person.

The above terms reflect a potential business arrangement, are provided
solely as a basis for further discussion, and are not intended to be and do
not constitute a legally binding obligation. No legally binding obligations
will be created, implied, or inferred until an agreement in final form is
executed in writing by all parties involved.

Re: Better naming for runner specific options

Posted by Kenneth Knowles <ke...@apache.org>.
Even though they are in classes named for specific runners, they are not
namespaced. All PipelineOptions exist in a global namespace so they need to
be careful to be very precise.

It is a good point that even though they may be multiple uses for "machine
type" they are probably not going to both happen at the same time.

If it becomes an issue, another thing we could do would be to add
namespacing support so options have less spooky action, or at least have a
way to resolve it when it happens on accident.

Kenn

On Fri, May 3, 2019 at 10:43 AM Chamikara Jayalath <ch...@google.com>
wrote:

> Also, we do have runner specific options classes where truly runner
> specific options can go.
>
>
> https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
>
> https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java
>
> On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <al...@google.com> wrote:
>
>> I agree, that is a good point.
>>
>> *From: *Lukasz Cwik <lc...@google.com>
>> *Date: *Fri, May 3, 2019 at 9:37 AM
>> *To: *dev
>>
>> The concept of a machine type isn't necessarily limited to Dataflow. If
>>> it made sense for a runner, they could use AWS/Azure machine types as well.
>>>
>>> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> This idea was discussed in a PR a few months ago, and JIRA was filed as
>>>> a follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
>>>> issue here is that, such a change will very likely be a backward
>>>> incompatible change and would be hard to do before the next major version.
>>>>
>>>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>>>
>>>> *From: *Reza Rokni <re...@google.com>
>>>> *Date: *Thu, May 2, 2019 at 8:00 PM
>>>> *To: * <de...@beam.apache.org>
>>>>
>>>> Hi,
>>>>>
>>>>> Was reading this SO question:
>>>>>
>>>>>
>>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>>
>>>>> And noticed that in
>>>>>
>>>>>
>>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>>
>>>>> The option is called --worker_machine_type.
>>>>>
>>>>> I wonder if runner specific options should have the runner in the
>>>>> prefix? Something like --dataflow_worker_machine_type?
>>>>>
>>>>> Cheers
>>>>> Reza
>>>>>
>>>>> --
>>>>>
>>>>> This email may be confidential and privileged. If you received this
>>>>> communication by mistake, please don't forward it to anyone else, please
>>>>> erase all copies and attachments, and please let me know that it has gone
>>>>> to the wrong person.
>>>>>
>>>>> The above terms reflect a potential business arrangement, are provided
>>>>> solely as a basis for further discussion, and are not intended to be and do
>>>>> not constitute a legally binding obligation. No legally binding obligations
>>>>> will be created, implied, or inferred until an agreement in final form is
>>>>> executed in writing by all parties involved.
>>>>>
>>>>

Re: Better naming for runner specific options

Posted by Chamikara Jayalath <ch...@google.com>.
Also, we do have runner specific options classes where truly runner
specific options can go.

https://github.com/apache/beam/blob/master/runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineOptions.java
https://github.com/apache/beam/blob/master/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkPipelineOptions.java

On Fri, May 3, 2019 at 9:50 AM Ahmet Altay <al...@google.com> wrote:

> I agree, that is a good point.
>
> *From: *Lukasz Cwik <lc...@google.com>
> *Date: *Fri, May 3, 2019 at 9:37 AM
> *To: *dev
>
> The concept of a machine type isn't necessarily limited to Dataflow. If it
>> made sense for a runner, they could use AWS/Azure machine types as well.
>>
>> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> This idea was discussed in a PR a few months ago, and JIRA was filed as
>>> a follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
>>> issue here is that, such a change will very likely be a backward
>>> incompatible change and would be hard to do before the next major version.
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>>
>>> *From: *Reza Rokni <re...@google.com>
>>> *Date: *Thu, May 2, 2019 at 8:00 PM
>>> *To: * <de...@beam.apache.org>
>>>
>>> Hi,
>>>>
>>>> Was reading this SO question:
>>>>
>>>>
>>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>>
>>>> And noticed that in
>>>>
>>>>
>>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>>
>>>> The option is called --worker_machine_type.
>>>>
>>>> I wonder if runner specific options should have the runner in the
>>>> prefix? Something like --dataflow_worker_machine_type?
>>>>
>>>> Cheers
>>>> Reza
>>>>
>>>> --
>>>>
>>>> This email may be confidential and privileged. If you received this
>>>> communication by mistake, please don't forward it to anyone else, please
>>>> erase all copies and attachments, and please let me know that it has gone
>>>> to the wrong person.
>>>>
>>>> The above terms reflect a potential business arrangement, are provided
>>>> solely as a basis for further discussion, and are not intended to be and do
>>>> not constitute a legally binding obligation. No legally binding obligations
>>>> will be created, implied, or inferred until an agreement in final form is
>>>> executed in writing by all parties involved.
>>>>
>>>

Re: Better naming for runner specific options

Posted by Ahmet Altay <al...@google.com>.
I agree, that is a good point.

*From: *Lukasz Cwik <lc...@google.com>
*Date: *Fri, May 3, 2019 at 9:37 AM
*To: *dev

The concept of a machine type isn't necessarily limited to Dataflow. If it
> made sense for a runner, they could use AWS/Azure machine types as well.
>
> On Fri, May 3, 2019 at 9:32 AM Ahmet Altay <al...@google.com> wrote:
>
>> This idea was discussed in a PR a few months ago, and JIRA was filed as a
>> follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
>> issue here is that, such a change will very likely be a backward
>> incompatible change and would be hard to do before the next major version.
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-6531
>>
>> *From: *Reza Rokni <re...@google.com>
>> *Date: *Thu, May 2, 2019 at 8:00 PM
>> *To: * <de...@beam.apache.org>
>>
>> Hi,
>>>
>>> Was reading this SO question:
>>>
>>>
>>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>>
>>> And noticed that in
>>>
>>>
>>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>>
>>> The option is called --worker_machine_type.
>>>
>>> I wonder if runner specific options should have the runner in the
>>> prefix? Something like --dataflow_worker_machine_type?
>>>
>>> Cheers
>>> Reza
>>>
>>> --
>>>
>>> This email may be confidential and privileged. If you received this
>>> communication by mistake, please don't forward it to anyone else, please
>>> erase all copies and attachments, and please let me know that it has gone
>>> to the wrong person.
>>>
>>> The above terms reflect a potential business arrangement, are provided
>>> solely as a basis for further discussion, and are not intended to be and do
>>> not constitute a legally binding obligation. No legally binding obligations
>>> will be created, implied, or inferred until an agreement in final form is
>>> executed in writing by all parties involved.
>>>
>>

Re: Better naming for runner specific options

Posted by Lukasz Cwik <lc...@google.com>.
The concept of a machine type isn't necessarily limited to Dataflow. If it
made sense for a runner, they could use AWS/Azure machine types as well.

On Fri, May 3, 2019 at 9:32 AM Ahmet Altay <al...@google.com> wrote:

> This idea was discussed in a PR a few months ago, and JIRA was filed as a
> follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
> issue here is that, such a change will very likely be a backward
> incompatible change and would be hard to do before the next major version.
>
> [1] https://issues.apache.org/jira/browse/BEAM-6531
>
> *From: *Reza Rokni <re...@google.com>
> *Date: *Thu, May 2, 2019 at 8:00 PM
> *To: * <de...@beam.apache.org>
>
> Hi,
>>
>> Was reading this SO question:
>>
>>
>> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>>
>> And noticed that in
>>
>>
>> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>>
>> The option is called --worker_machine_type.
>>
>> I wonder if runner specific options should have the runner in the prefix?
>> Something like --dataflow_worker_machine_type?
>>
>> Cheers
>> Reza
>>
>> --
>>
>> This email may be confidential and privileged. If you received this
>> communication by mistake, please don't forward it to anyone else, please
>> erase all copies and attachments, and please let me know that it has gone
>> to the wrong person.
>>
>> The above terms reflect a potential business arrangement, are provided
>> solely as a basis for further discussion, and are not intended to be and do
>> not constitute a legally binding obligation. No legally binding obligations
>> will be created, implied, or inferred until an agreement in final form is
>> executed in writing by all parties involved.
>>
>

Re: Better naming for runner specific options

Posted by Ahmet Altay <al...@google.com>.
This idea was discussed in a PR a few months ago, and JIRA was filed as a
follow up [1]. IMO, it makes sense to use a namespace prefix. The primary
issue here is that, such a change will very likely be a backward
incompatible change and would be hard to do before the next major version.

[1] https://issues.apache.org/jira/browse/BEAM-6531

*From: *Reza Rokni <re...@google.com>
*Date: *Thu, May 2, 2019 at 8:00 PM
*To: * <de...@beam.apache.org>

Hi,
>
> Was reading this SO question:
>
>
> https://stackoverflow.com/questions/53833171/googlecloudoptions-doesnt-have-all-options-that-pipeline-options-has
>
> And noticed that in
>
>
> https://beam.apache.org/releases/pydoc/2.12.0/_modules/apache_beam/options/pipeline_options.html#WorkerOptions
>
> The option is called --worker_machine_type.
>
> I wonder if runner specific options should have the runner in the prefix?
> Something like --dataflow_worker_machine_type?
>
> Cheers
> Reza
>
> --
>
> This email may be confidential and privileged. If you received this
> communication by mistake, please don't forward it to anyone else, please
> erase all copies and attachments, and please let me know that it has gone
> to the wrong person.
>
> The above terms reflect a potential business arrangement, are provided
> solely as a basis for further discussion, and are not intended to be and do
> not constitute a legally binding obligation. No legally binding obligations
> will be created, implied, or inferred until an agreement in final form is
> executed in writing by all parties involved.
>