You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2021/08/02 11:51:15 UTC

[GitHub] [beam] calvinleungyk commented on pull request #15105: [BEAM-11275] Defer remote package download in stager and GetArtifact from GCS

calvinleungyk commented on pull request #15105:
URL: https://github.com/apache/beam/pull/15105#issuecomment-890962522


   Hi @ibzib, here's what I have so far:
   
   I am not adding an ArtifactInformation of URL type in `stager.py` and am writing the remote file paths to `EXTRA_PACKAGES_FILE = 'extra_packages.txt'` in `sdks/python/apache_beam/runners/portability/stager.py`. This file is then read in [installExtraPackages](https://github.com/apache/beam/blob/dce846b36a4fb9140c4c5d14e10b72f835f03d98/sdks/python/container/piputil.go#L114) and `pip` tries to install the package directly, which will fail on private GCS bucket. If I generate an ArtifactInformation of URL type, the worker will eventually run [extractStagingToPath](https://github.com/apache/beam/blob/dce846b36a4fb9140c4c5d14e10b72f835f03d98/sdks/go/pkg/beam/artifact/materialize.go#L139) on all ArtifactInformation and checks if the ArtifactInformation has a `URNStagingTo` role or if the type is `URNFileArtifact`, and both evaluate to `False` and the function will give an error. 
   
   I might be missing some place where the worker is using the artifact service to download artifacts as I'm not familiar with the worker code. If the above is inaccurate, would you be able to show me where the worker would attempt to fetch a URL artifact?
   
   As for integration tests, I am running into credential issues which prevents the job from reaching Compute Engine Metadata server with error `WARNING:google.auth.compute_engine._metadata:Compute Engine Metadata server unavailable onattempt 1 of 3. Reason: timed out`, 
   ```
   WARNING:apache_beam.internal.gcp.auth:Unable to find default credentials to use: The Application Default Credentials are not available. They are available if running in Google Compute Engine. Otherwise, the environment variable GOOGLE_APPLICATION_CREDENTIALS must be defined pointing to a file defining the credentials. See https://developers.google.com/accounts/docs/application-default-credentials for more information.
   Connecting anonymously.
   ...
   Failed to start a local webserver listening on either port 8080
   or port 8090. Please check your firewall settings and locally
   ```
   The Gradle error is:
   ```
   FAILURE: Build failed with an exception.
   
   * What went wrong:
   Gradle build daemon disappeared unexpectedly (it may have been killed or may have crashed)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org