You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by GitBox <gi...@apache.org> on 2020/03/05 01:18:15 UTC

[GitHub] [beam] ibzib opened a new pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

ibzib opened a new pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052
 
 
   I'm not thrilled about manually copying these over, later I might look into a long-term solution to this problem. But this works for now.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
   --- | --- | --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/) 
   Portable | --- | [![Build Status](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://builds.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-601814129
 
 
   > @ibzib Could we try to keep the Runner options in case we are not reading them from the job server? We wouldn't have to manually add the options then.
   
   Yeah, that's the plan. Just haven't gotten around to it yet. Looking now.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-611244971
 
 
   This PR is obsolete since #11189 is merged.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] robertwb commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
robertwb commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r395220392
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
+        default='PIPELINED',
+        help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   While experiments are preferred, I think we can document options such as this as experimental as well. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388601783
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
+        default='PIPELINED',
+        help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   I think that's what experiment(s) are for: https://github.com/apache/beam/blob/35beffc5775636eb96e33eb57c6e5f213cfe033a/sdks/python/apache_beam/options/pipeline_options.py#L803-L811

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib closed pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib closed pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] angoenka commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
angoenka commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-594980292
 
 
   If I understand correctly, this is needed because we don't go though job api for submission. 
   In job api based submission, runner options are automatically pulled from the runner?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-596491709
 
 
   Have you tried working around this by not discarding these options? AFAIK the json parser is smart enough to read the stringified verison of all option values.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] robertwb commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
robertwb commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-595937673
 
 
   I'm OK with this as a short-term fix, but we should still pursue fixing this properly as this is not scalable. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
mxm commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388327586
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
 
 Review comment:
   I'm not sure we should add these here because that's what we used to do and it was inconsistent and hard to maintain. What we want to do, is to not discard those options but still warn about them not being parsed in the Python SDK. This will allow us to still use them in the Runner code.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388602678
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
 
 Review comment:
   I agree, though as discussed earlier we might have difficulties parsing non-string options. I'll try it and see how it goes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598827334
 
 
   > Integer options need to be converted to string and that is done in the portable runner:
   
   All the unknown options will be strings already, so that shouldn't be needed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r392331379
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
 
 Review comment:
   > I think we should continue to discard invalid options when the client is aware of the full set of options and can perform the validation
   
   Agreed, we would keep options only when we can't be sure (so just the uber jar job server for now).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tweise commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
tweise commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598825634
 
 
   Integer options need to be converted to string and that is done in the portable runner: https://github.com/apache/beam/blob/6fce2528b0ff011ff3d416ac9e70c8d7264b1f3c/sdks/python/apache_beam/runners/portability/portable_runner.py#L162 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-601768565
 
 
   @ibzib Could we try to keep the Runner options in case we are not reading them from the job server? We wouldn't have to manually add the options then.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-597375999
 
 
   > Have you tried working around this by not discarding these options? AFAIK the json parser is smart enough to read the stringified verison of all option values.
   
   I think this may be the best strategy for the uber jar job server, however I don't think we should change this behavior for other runners. (Not sure if that's what you were proposing, just organizing my thoughts here:)
   - In Dataflow, we seem to duplicate every runner option for each SDK, perhaps because there is no better choice due to the runner architecture. In that case, since all the args are presumably known by the SDK, it makes more sense to drop them (status quo) or maybe even error when arguments are unknown, because it usually means the user made a mistake.
   - With the "old" Flink job server, retrieving args from the job server is an adequate workaround, so again, there should be no need for unrecognized arguments.
   
   I discussed this with @angoenka today and he suggested that we consider the runner-Flink boundary as well -- i.e., if we should have some way of enabling _all_ Flink environment options to be set through Beam pipeline options instead of just adding the ones we need as we go. This would potentially save users from having to wait for a new release just for us to add a pipeline option that trivially maps 1:1 to Flink (of course, they can always change Flink's conf files, which was going to be my proposed workaround here, but AFAIK that requires a restart of the cluster and would affect all jobs run on the cluster). WDYT?
   
   +cc @tweise 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
ibzib commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-595990099
 
 
   > I'm OK with this as a short-term fix, but we should still pursue fixing this properly as this is not scalable.
   
   @robertwb for a short term fix, I think we can get by with setting these options in the Flink cluster configuration file instead.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] iemejia commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
iemejia commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r388232005
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
+        default='PIPELINED',
+        help='Flink mode for data exchange of batch pipelines. '
 
 Review comment:
   (slightly unrelated to the PR comment) Do we have a way to mark pipelineoptions as `@Experimental` in Python? PipelineOptions are critical from the point of view of backwards compatibility, so we should probably be marking non stable options (if we do not).
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] tweise commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
tweise commented on a change in pull request #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#discussion_r392326569
 
 

 ##########
 File path: sdks/python/apache_beam/options/pipeline_options.py
 ##########
 @@ -1075,6 +1075,22 @@ def _add_argparse_args(cls, parser):
         ' directly, rather than starting up a job server.'
         ' Only applies when flink_master is set to a'
         ' cluster address.  Requires Python 3.6+.')
+    parser.add_argument(
+        '--parallelism',
+        default=-1,
+        type=int,
+        help='The degree of parallelism to be used when distributing '
+             'operations onto workers. If the parallelism is not set, the '
+             'configured Flink default is used, or 1 if none can be found.'
+    )
+    parser.add_argument(
+        '--execution_mode_for_batch',
 
 Review comment:
   I think we should continue to discard invalid options when the client is aware of the full set of options and can perform the validation - as is the case with portable runner and job server.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [beam] mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.

Posted by GitBox <gi...@apache.org>.
mxm commented on issue #11052: [BEAM-9446] Add missing parallelism and execution mode args.
URL: https://github.com/apache/beam/pull/11052#issuecomment-598434409
 
 
   What I meant is keeping "unknown" pipeline options instead of discarding them. If we pass the unknown ones as strings I thought this could work. For example, if we pass `"123"` the parser on the Java side is smart enough to figure it is the integer `123`. I haven't tried this out but I'd suggest we do.
   
   Of course this is not very user-friendly. That's why I'd suggest to write pipeline options in a language-agnostic format which can be read by all SDKs.
   
   Considering the Flink pipeline options, what we usually want to set is what is called `ExecutionConfig` in Flink. With Flink they are configured directly in the API. For Beam if the parameters are just strings/ints we could come up with something reflection based to set arbitrary new settings. I agree this would be helpful.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services