You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Ayoub Ennassiri (Jira)" <ji...@apache.org> on 2020/09/08 17:07:00 UTC

[jira] [Commented] (BEAM-10705) Passing whl files in --sdk_location does not work for https locations.

    [ https://issues.apache.org/jira/browse/BEAM-10705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192320#comment-17192320 ] 

Ayoub Ennassiri commented on BEAM-10705:
----------------------------------------

[~tvalentyn], I've been using Apache Beam for a while and decided to contribute to it. Would you mind assigning this one to me? 

The issue seems to be at [https://github.com/apache/beam/blob/af2d6b0379d64b522ecb769d88e9e7e7b8900208/sdks/python/apache_beam/runners/portability/stager.py#L659] 

Similarly to the case where the sdk remote location is pypi, the *_desired_sdk_filename_in_staging_location* check should be done on *local_download_file* and not on *sdk_remote_location.* 

Thus instead of: 

 
{code:java}
staged_name = Stager._desired_sdk_filename_in_staging_location(
 sdk_remote_location){code}
 

We should have:
{code:java}
staged_name = Stager._desired_sdk_filename_in_staging_location(
 local_download_file){code}
Is this what you expect? 

 

> Passing whl files in --sdk_location does not work  for https locations.
> -----------------------------------------------------------------------
>
>                 Key: BEAM-10705
>                 URL: https://issues.apache.org/jira/browse/BEAM-10705
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-py-core
>            Reporter: Valentyn Tymofieiev
>            Priority: P3
>              Labels: starter
>
> Sample repro:
> python -m apache_beam.examples.wordcount --input=gs://dataflow-samples/shakespeare/kinglear.txt --output /tmp/wordcount  --runner=DataflowRunner --project=google.com:clo
> uddfe --temp_location gs://clouddfe-valentyn/tmp/ --region=us-central1 --sdk_location=https://
> storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198
> 203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
> {noformat}
>   File "/home/valentyn/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 193, in _run_module_as_main
>     "__main__", mod_spec)
>   File "/home/valentyn/.pyenv/versions/3.7.3/lib/python3.7/runpy.py", line 85, in _run_code
>     exec(code, run_globals)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/examples/wordcount.py", line 99, in <module>
>     run()
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/examples/wordcount.py", line 94, in run
>     output | 'Write' >> WriteToText(known_args.output)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 555, in __exit__
>     self.result = self.run()
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 521, in run
>     allow_proto_holders=True).run(False)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/pipeline.py", line 534, in run
>     return self.runner.run_pipeline(self, self._options)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/dataflow/dataflow_runner.py", line 479, in run_pipeline
>     artifacts=environments.python_sdk_dependencies(options)))
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/transforms/environments.py", line 613, in python_sdk_dependencies
>     staged_name in stager.Stager.create_job_resources(options, tmp_dir))
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 235, in create_job_resources
>     resources.extend(Stager._create_beam_sdk(sdk_remote_location, temp_dir))
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 659, in _create_beam_sdk
>     sdk_remote_location)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/runners/portability/stager.py", line 596, in _desired_sdk_filename_in_staging_location
>     _, wheel_filename = FileSystems.split(sdk_location)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/io/filesystems.py", line 151, in split
>     filesystem = FileSystems.get_filesystem(path)
>   File "/home/valentyn/projects/beam/beam2/beam/sdks/python/apache_beam/io/filesystems.py", line 106, in get_filesystem
>     'e.g., pip install apache-beam[gcp]. Path specified: %s' % path)
> ValueError: Unable to get filesystem from specified path, please use the correct path or ensure the required dependency is installed, e.g., pip install apache-beam[gcp]. Path specified: https://storage.googleapis.com/beam-wheels-staging/master/94f9e7fd4cae0f8aa6587d2cf14887f1c4827485-198203585/apache_beam-2.24.0.dev0-cp27-cp27m-macosx_10_9_x86_64.whl
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)