You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "kerrydc (via GitHub)" <gi...@apache.org> on 2023/08/10 18:34:47 UTC

[GitHub] [beam] kerrydc opened a new pull request, #27959: Updates validation on --dataflow_endpoint to accept any legal url.

kerrydc opened a new pull request, #27959:
URL: https://github.com/apache/beam/pull/27959

   Previously the  --dataflow_endpoint pipeline option require the url to match a regex containing 'googleapis'. This change makes it possible to use any legal url for dataflow-endpoint. 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1290705093


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -231,6 +230,17 @@ def validate_cloud_options(self, view):
         errors.extend(self._validate_error(self.ERR_MISSING_OPTION, 'region'))
       else:
         view.region = default_region
+    dataflow_endpoint = view.dataflow_endpoint
+    if dataflow_endpoint is None:
+      errors.extend(
+          self._validate_error(self.ERR_MISSING_OPTION, dataflow_endpoint))
+    else:
+      valid_endpoint = validators.url(dataflow_endpoint)
+      print(dataflow_endpoint, validators.url(dataflow_endpoint))

Review Comment:
   is this for debugging? if so, shall we remove it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1681154646

   Finally all tests pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291702676


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -393,3 +402,16 @@ def validate_repeatable_argument_passed_as_list(self, view, arg_name):
       return self._validate_error(
           self.ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST, arg, arg_name)
     return []
+
+  # Minimally validates the endpoint url. This is not a strict application
+  # of http://www.faqs.org/rfcs/rfc1738.html.
+  def validate_endpoint_url(self, endpoint_url):

Review Comment:
   I would wait until we need it somewhere else.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1675375753

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1295180049


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -864,10 +864,6 @@ def validate(self, validator):
         else:
           setattr(self, 'temp_location', default_bucket)
 
-      if getattr(self, 'staging_location',
-                 None) or getattr(self, 'temp_location', None) is None:
-        errors.extend(validator.validate_gcs_path(self, 'staging_location'))

Review Comment:
   Updated to follow https://cloud.google.com/dataflow/docs/reference/pipeline-options.
   Previously it seems we would accept a local filepath for staging_location. See sdks/python/apache_beam/runners/dataflow/template_runner_test.py which depends on writing locally, or https://github.com/apache/beam/blob/9701e5cb7c17ee922578e9dd62fa3f6d85a91800/sdks/python/apache_beam/runners/dataflow/internal/apiclient.py#L649 which says it takes gcs or local path. I updated the template test, but I'm leaving the apiclient alone.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291705475


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -346,6 +341,11 @@ def test_is_service_runner(self):
     for case in test_cases:
       validator = PipelineOptionsValidator(
           PipelineOptions(case['options']), case['runner'])
+      print(

Review Comment:
   removed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291703186


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -23,6 +23,7 @@
 
 import logging
 import re
+import validators

Review Comment:
   Java doesn't validate the url. It also optionally uses an ENV variable.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1677573916

   Looks like the validation isn't happy in some test suites, here's an example:
   
   ```
   15:40:49 =========================== short test summary info ============================
   15:40:49 FAILED apache_beam/runners/dataflow/dataflow_runner_test.py::DataflowRunnerTest::test_environment_override_translation_legacy_worker_harness_image - ValueError: Pipeline has validations errors: 
   15:40:49 Missing required option: region.
   15:40:49 Invalid url (ignored) for dataflow endpoint. Please provide a valid url.
   15:40:49 Invalid GCS path (/dev/null), given for the option: temp_location.
   15:40:49 Invalid GCS path (ignored), given for the option: staging_location.
   15:40:49 FAILED 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey merged pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey merged PR #27959:
URL: https://github.com/apache/beam/pull/27959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291382460


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -231,6 +230,17 @@ def validate_cloud_options(self, view):
         errors.extend(self._validate_error(self.ERR_MISSING_OPTION, 'region'))
       else:
         view.region = default_region
+    dataflow_endpoint = view.dataflow_endpoint
+    if dataflow_endpoint is None:
+      errors.extend(
+          self._validate_error(self.ERR_MISSING_OPTION, dataflow_endpoint))
+    else:
+      valid_endpoint = validators.url(dataflow_endpoint)
+      print(dataflow_endpoint, validators.url(dataflow_endpoint))

Review Comment:
   Good catch, removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1674000976

   Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment `assign set of reviewers`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1684050734

   I really would like to keep some useful validation here. In the early days of Dataflow it was a common problem for people to specify a staging location that was local, and then the worker crashes because it cannot find the code it is expected to execute. Whatever is causing us to have to remove that validation in some cases, we should keep it in the main case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1293936214


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -333,7 +328,7 @@ def test_is_service_runner(self):
         },
         {
             'runner': MockRunners.DataflowRunner(),
-            'options': ['--dataflow_endpoint=https://dataflow.googleapis.com/'],
+            'options': ['--dataflow_endpoint=foo: //dataflow. googleapis. com'],

Review Comment:
   The whitespace is intentional, this should (and does) fail validation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1295180049


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -864,10 +864,6 @@ def validate(self, validator):
         else:
           setattr(self, 'temp_location', default_bucket)
 
-      if getattr(self, 'staging_location',
-                 None) or getattr(self, 'temp_location', None) is None:
-        errors.extend(validator.validate_gcs_path(self, 'staging_location'))

Review Comment:
   Updates to follow https://cloud.google.com/dataflow/docs/reference/pipeline-options.
   Previously it seems we would accept a local filepath for staging_location. See sdks/python/apache_beam/runners/dataflow/template_runner_test.py which depends on writing locally. I updated that as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1683965622

   @kerrydc you do want to hold for @kennknowles' review?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1680773426

   Linting is unhappy and there are a few test cases that are still failing. Seems like a lot of unit tests banked on the validation being extremely loose


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291506778


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -92,6 +94,8 @@ class PipelineOptionsValidator(object):
   ERR_INVALID_PROJECT_ID = (
       'Invalid Project ID (%s). Please make sure you specified the Project ID, '
       'not project description.')
+  ERR_INVALID_ENDPOINT = (
+      'Invalid url (%s) for dataflow endpoint. Please provide a valid url.')

Review Comment:
   Can we make it more actionable like enpoint url examples to let users know what format Beam expects?



##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -346,6 +341,11 @@ def test_is_service_runner(self):
     for case in test_cases:
       validator = PipelineOptionsValidator(
           PipelineOptions(case['options']), case['runner'])
+      print(

Review Comment:
   remove this?



##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -393,3 +402,16 @@ def validate_repeatable_argument_passed_as_list(self, view, arg_name):
       return self._validate_error(
           self.ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST, arg, arg_name)
     return []
+
+  # Minimally validates the endpoint url. This is not a strict application
+  # of http://www.faqs.org/rfcs/rfc1738.html.
+  def validate_endpoint_url(self, endpoint_url):

Review Comment:
   shall we move this to apache_beam.internal or other util modules since it can be used elsewhere? Then we could just add unit tests independent of PipelineOptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291702350


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -92,6 +94,8 @@ class PipelineOptionsValidator(object):
   ERR_INVALID_PROJECT_ID = (
       'Invalid Project ID (%s). Please make sure you specified the Project ID, '
       'not project description.')
+  ERR_INVALID_ENDPOINT = (
+      'Invalid url (%s) for dataflow endpoint. Please provide a valid url.')

Review Comment:
   Beam doesn't expect anything more specific. We only test for a valid url that starts with 'http' or 'https'.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1293987793


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -291,7 +300,7 @@ def validate_worker_region_zone(self, view):
           self._validate_error(
               'Cannot use deprecated flag --zone along with worker_region or '
               'worker_zone.'))
-    if self.options.view_as(DebugOptions).lookup_experiment('worker_region')\
+    if self.options.view_as(DebugOptions).lookup_experiment('worker_region') \

Review Comment:
   yep



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1294945623


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -864,10 +864,6 @@ def validate(self, validator):
         else:
           setattr(self, 'temp_location', default_bucket)
 
-      if getattr(self, 'staging_location',
-                 None) or getattr(self, 'temp_location', None) is None:
-        errors.extend(validator.validate_gcs_path(self, 'staging_location'))

Review Comment:
   In a discovery Kerry and I discussed offline, that is surprisingly not true. The [template_runner_test.py](https://github.com/apache/beam/blob/e59f001fdb3d5104cf7f82cdbe0b099e32c7b9c1/sdks/python/apache_beam/runners/dataflow/template_runner_test.py#L59) actually banks on using a local tmpdir staging location when it builds a template. The unit test fails if you try to sub it out with a fake GCS bucket because it can't stage the files



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1296459640


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -111,69 +111,65 @@ def get_validator(temp_location, staging_location):
       validator = PipelineOptionsValidator(pipeline_options, runner)
       return validator
 
-    test_cases = [
-        {
-            'temp_location': None,
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': None,
-            'staging_location': None,
-            'errors': ['staging_location', 'temp_location']
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': None,
-            'errors': []
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': 'gs://ABC/bar',
-            'errors': ['staging_location']
-        },
-        {
-            'temp_location': 'gcs:/foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs:/foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://ABC/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://ABC/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://foo',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://foo/',
-            'staging_location': 'gs://foo/bar',
-            'errors': []
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': []
-        },
-    ]
+    test_cases = [{
+        'temp_location': None, 'staging_location': 'gs://foo/bar', 'errors': []
+    },
+                  {
+                      'temp_location': None,
+                      'staging_location': None,
+                      'errors': ['staging_location', 'temp_location']
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': None,
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': 'gs://ABC/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gcs:/foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs:/foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://ABC/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://ABC/bar',
+                      'staging_location': 'gs://BCD/bar',
+                      'errors': ['temp_location', 'staging_location']
+                  },
+                  {
+                      'temp_location': 'gs://foo',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  }]
 
     for case in test_cases:
       errors = get_validator(case['temp_location'],
                              case['staging_location']).validate()
       self.assertEqual(
-          self.check_errors_for_arguments(errors, case['errors']), [])
+          self.check_errors_for_arguments(errors, case['errors']), [], case)

Review Comment:
   True enough, but I would rather get this in with a small improvement and leave the major refactoring to a future PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1680917660

   ## [Codecov](https://app.codecov.io/gh/apache/beam/pull/27959?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) Report
   > Merging [#27959](https://app.codecov.io/gh/apache/beam/pull/27959?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (ed50e91) into [master](https://app.codecov.io/gh/apache/beam/commit/e3c58d45cf520cb4d39214e9c3d2964b63a0ae67?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) (e3c58d4) will **increase** coverage by `0.29%`.
   > Report is 56 commits behind head on master.
   > The diff coverage is `92.30%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #27959      +/-   ##
   ==========================================
   + Coverage   72.02%   72.32%   +0.29%     
   ==========================================
     Files         674      678       +4     
     Lines       98554    99758    +1204     
   ==========================================
   + Hits        70985    72147    +1162     
   - Misses      26008    26050      +42     
     Partials     1561     1561              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `82.86% <92.30%> (+0.26%)` | :arrow_up: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Files Changed](https://app.codecov.io/gh/apache/beam/pull/27959?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache) | Coverage Δ | |
   |---|---|---|
   | [.../apache\_beam/options/pipeline\_options\_validator.py](https://app.codecov.io/gh/apache/beam/pull/27959?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zX3ZhbGlkYXRvci5weQ==) | `97.48% <85.00%> (+0.23%)` | :arrow_up: |
   | [...dks/python/apache\_beam/options/pipeline\_options.py](https://app.codecov.io/gh/apache/beam/pull/27959?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vb3B0aW9ucy9waXBlbGluZV9vcHRpb25zLnB5) | `94.53% <100.00%> (+0.33%)` | :arrow_up: |
   
   ... and [22 files with indirect coverage changes](https://app.codecov.io/gh/apache/beam/pull/27959/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=apache)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1296352735


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -479,20 +470,17 @@ def test_experiment_region_and_worker_region_mutually_exclusive(self):
     self.assertIn('experiment', errors[0])
     self.assertIn('worker_region', errors[0])
 
-  def test_experiment_region_and_worker_zone_mutually_exclusive(self):
+  def test_region_and_worker_zone_mutually_exclusive(self):
     runner = MockRunners.DataflowRunner()
     options = PipelineOptions([
-        '--experiments',

Review Comment:
   Did we lose some coverage of behavior with experiments here? 



##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -111,69 +111,65 @@ def get_validator(temp_location, staging_location):
       validator = PipelineOptionsValidator(pipeline_options, runner)
       return validator
 
-    test_cases = [
-        {
-            'temp_location': None,
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': None,
-            'staging_location': None,
-            'errors': ['staging_location', 'temp_location']
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': None,
-            'errors': []
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': 'gs://ABC/bar',
-            'errors': ['staging_location']
-        },
-        {
-            'temp_location': 'gcs:/foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs:/foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://ABC/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://ABC/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://foo',
-            'staging_location': 'gs://foo/bar',
-            'errors': ['temp_location']
-        },
-        {
-            'temp_location': 'gs://foo/',
-            'staging_location': 'gs://foo/bar',
-            'errors': []
-        },
-        {
-            'temp_location': 'gs://foo/bar',
-            'staging_location': 'gs://foo/bar',
-            'errors': []
-        },
-    ]
+    test_cases = [{
+        'temp_location': None, 'staging_location': 'gs://foo/bar', 'errors': []
+    },
+                  {
+                      'temp_location': None,
+                      'staging_location': None,
+                      'errors': ['staging_location', 'temp_location']
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': None,
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': 'gs://ABC/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gcs:/foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs:/foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://ABC/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://ABC/bar',
+                      'staging_location': 'gs://BCD/bar',
+                      'errors': ['temp_location', 'staging_location']
+                  },
+                  {
+                      'temp_location': 'gs://foo',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  },
+                  {
+                      'temp_location': 'gs://foo/bar',
+                      'staging_location': 'gs://foo/bar',
+                      'errors': []
+                  }]
 
     for case in test_cases:
       errors = get_validator(case['temp_location'],
                              case['staging_location']).validate()
       self.assertEqual(
-          self.check_errors_for_arguments(errors, case['errors']), [])
+          self.check_errors_for_arguments(errors, case['errors']), [], case)

Review Comment:
   Printing the entire case here is okay, but providing each case a name (even if it's not incredibly descriptive) will help self-document what each individual case is supposed to do



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc closed pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc closed pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.
URL: https://github.com/apache/beam/pull/27959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1294921227


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -864,10 +864,6 @@ def validate(self, validator):
         else:
           setattr(self, 'temp_location', default_bucket)
 
-      if getattr(self, 'staging_location',
-                 None) or getattr(self, 'temp_location', None) is None:
-        errors.extend(validator.validate_gcs_path(self, 'staging_location'))

Review Comment:
   For dataflow, staging location has to be GCS. Does this remove the validation? Other runners may have other validation. Should live in the runner.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1290708812


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -23,6 +23,7 @@
 
 import logging
 import re
+import validators

Review Comment:
   are we comfortable to introduce this new package? I recommend we should use pydantic: https://docs.pydantic.dev/latest/usage/types/urls/



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291392452


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -23,6 +23,7 @@
 
 import logging
 import re
+import validators

Review Comment:
   make sense. Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1684114279

   Kenn, I don't quite understand your comment. I have made the validation of staging location stronger. Now it must be a gs bucket if the runner is dataflow. The only place validation is loosed in this PR is the dataflow-endpoint now accepts any valid url. Not all dataflow endpoints include the string 'googleapis', and the validation I changed was not present in Java.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on PR #27959:
URL: https://github.com/apache/beam/pull/27959#issuecomment-1684387516

   3 approvals, please merge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291737632


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -92,6 +94,8 @@ class PipelineOptionsValidator(object):
   ERR_INVALID_PROJECT_ID = (
       'Invalid Project ID (%s). Please make sure you specified the Project ID, '
       'not project description.')
+  ERR_INVALID_ENDPOINT = (
+      'Invalid url (%s) for dataflow endpoint. Please provide a valid url.')

Review Comment:
   I see. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1296458776


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -479,20 +470,17 @@ def test_experiment_region_and_worker_region_mutually_exclusive(self):
     self.assertIn('experiment', errors[0])
     self.assertIn('worker_region', errors[0])
 
-  def test_experiment_region_and_worker_zone_mutually_exclusive(self):
+  def test_region_and_worker_zone_mutually_exclusive(self):
     runner = MockRunners.DataflowRunner()
     options = PipelineOptions([
-        '--experiments',

Review Comment:
   Nope, the region and zone flags are no longer experiments. We were testing the failure expected when declaring both region and zone. I just updated to use the flags(modern usage), not use the flags declared as experiments (out of date usage).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] liferoad commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291393271


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -23,6 +23,7 @@
 
 import logging
 import re
+import validators

Review Comment:
   another question: do we know how the Java SDK handle this option now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kerrydc commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kerrydc (via GitHub)" <gi...@apache.org>.
kerrydc commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1291384270


##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -23,6 +23,7 @@
 
 import logging
 import re
+import validators

Review Comment:
   Thinking it over, I decided to do a very basic validation using urllib. It isn't worth bringing in new packages for just this functionality.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1293853214


##########
sdks/python/apache_beam/options/pipeline_options_validator_test.py:
##########
@@ -333,7 +328,7 @@ def test_is_service_runner(self):
         },
         {
             'runner': MockRunners.DataflowRunner(),
-            'options': ['--dataflow_endpoint=https://dataflow.googleapis.com/'],
+            'options': ['--dataflow_endpoint=foo: //dataflow. googleapis. com'],

Review Comment:
   Is the whitespace okay here?



##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -291,7 +300,7 @@ def validate_worker_region_zone(self, view):
           self._validate_error(
               'Cannot use deprecated flag --zone along with worker_region or '
               'worker_zone.'))
-    if self.options.view_as(DebugOptions).lookup_experiment('worker_region')\
+    if self.options.view_as(DebugOptions).lookup_experiment('worker_region') \

Review Comment:
   was this a n automated formatting change?



##########
sdks/python/apache_beam/options/pipeline_options_validator.py:
##########
@@ -393,3 +402,16 @@ def validate_repeatable_argument_passed_as_list(self, view, arg_name):
       return self._validate_error(
           self.ERR_REPEATABLE_OPTIONS_NOT_SET_AS_LIST, arg, arg_name)
     return []
+
+  # Minimally validates the endpoint url. This is not a strict application
+  # of http://www.faqs.org/rfcs/rfc1738.html.
+  def validate_endpoint_url(self, endpoint_url):

Review Comment:
   validating URLs is a relatively straightforward thing that we probably don't do often, I see the argument for having it as a util in a more general place but I don't know if it'll get much use



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on a diff in pull request #27959: Updates validation on --dataflow_endpoint to accept any legal url.

Posted by "kennknowles (via GitHub)" <gi...@apache.org>.
kennknowles commented on code in PR #27959:
URL: https://github.com/apache/beam/pull/27959#discussion_r1296357589


##########
sdks/python/apache_beam/options/pipeline_options.py:
##########
@@ -864,10 +864,6 @@ def validate(self, validator):
         else:
           setattr(self, 'temp_location', default_bucket)
 
-      if getattr(self, 'staging_location',
-                 None) or getattr(self, 'temp_location', None) is None:
-        errors.extend(validator.validate_gcs_path(self, 'staging_location'))

Review Comment:
   I'm not familiar with that runner or test. But obvs a Dataflow worker cannot fetch staged files from a local tmpdir. Sounds like there is some inappropriate implementation inheritance going on maybe?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org