You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Evgeniy (Jira)" <ji...@apache.org> on 2021/02/11 08:54:00 UTC
[jira] [Issue Comment Deleted] (BEAM-11275) Support GCS files for
extra_requirements argument in Python Beam portable runners
[ https://issues.apache.org/jira/browse/BEAM-11275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Evgeniy updated BEAM-11275:
---------------------------
Comment: was deleted
(was: I have same issue with Beam and Flink (without GCS and Kubeflow). I use `FlinkRunner`
Pipeline options:
{code:java}
--runner=FlinkRunner,
--flink_master=*.*.*.*:8081
--environment_type=EXTERNAL
--environment_config=*.*.*.*:50000
--flink_version=1.10
{code}
Flink taskexecutor log:
{code:java}
Java.io.FileNotFoundException: /tmp/beam-temp3g1emmsd/artifactsbv63p2is/a9d47b4ae5a1f4a0b5fddd51d2cd152583637e1353a615d5457aa19494051826/1-ref_Environment_default_e-tfx_ephemeral-0.27.0.tar.gz (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:127)
at org.apache.beam.sdk.io.LocalFileSystem.open(LocalFileSystem.java:83)
at org.apache.beam.sdk.io.FileSystems.open(FileSystems.java:256)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:124)
at org.apache.beam.runners.fnexecution.artifact.ArtifactRetrievalService.getArtifact(ArtifactRetrievalService.java:99)
at org.apache.beam.model.jobmanagement.v1.ArtifactRetrievalServiceGrpc$MethodHandlers.invoke(ArtifactRetrievalServiceGrpc.java:327)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:172)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:817)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)
at org.apache.beam.vendor.grpc.v1p26p0.io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}
Beam worker pool log:
{code:java}
2021/02/10 12:28:10 Failed to retrieve staged files: failed to retrieve /tmp/staged in 3 attempts: failed to retrieve chunk for /tmp/staged/tfx_ephemeral-0.27.0.tar.gz
{code})
> Support GCS files for extra_requirements argument in Python Beam portable runners
> ---------------------------------------------------------------------------------
>
> Key: BEAM-11275
> URL: https://issues.apache.org/jira/browse/BEAM-11275
> Project: Beam
> Issue Type: Improvement
> Components: sdk-py-core
> Reporter: Gerard Casas Saez
> Assignee: Calvin Leung
> Priority: P2
>
> Currently Portable runners only support locally available files for adding dependencies on remote workers. This can be seen in https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/stager.py#L429 as it uses shutil.copyfile when it detects file is remote and it is not http.
> An easy extension would be to extend _is_remote_path in Stager to detect if the path matches any filesystem and if it does the avoid downloading and let it be copied afterwards.
> Acceptance criteria:
> - `extra_package` can be a GCS path instead of requiring it to be local only.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)