You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Brad West (Jira)" <ji...@apache.org> on 2020/01/09 15:12:00 UTC

[jira] [Created] (BEAM-9078) Large Tarball Artifacts Should Use GCS Resumable Upload

Brad West created BEAM-9078:
-------------------------------

             Summary: Large Tarball Artifacts Should Use GCS Resumable Upload
                 Key: BEAM-9078
                 URL: https://issues.apache.org/jira/browse/BEAM-9078
             Project: Beam
          Issue Type: Bug
          Components: runner-dataflow
    Affects Versions: 2.17.0
            Reporter: Brad West
             Fix For: 2.19.0


It's possible for the tarball uploaded to GCS to be quite large. An example is a user vendoring multiple dependencies in their tarball so as to achieve a more stable deployable artifact.

Before this change the GCS upload api call executed a multipart upload, which Google [documentation]([https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload)] states should be used when the file is small enough to upload again when the connection fails. For large tarballs, we will hit 60 second socket timeouts before completing the multipart upload. By passing `total_size`, apitools first checks if the size exceeds the resumable upload threshold, and executes the more robust resumable upload rather than a multipart, avoiding
 socket timeouts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)