You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Luke Cwik (Jira)" <ji...@apache.org> on 2020/01/09 18:00:14 UTC

[jira] [Updated] (BEAM-9078) Large Tarball Artifacts Should Use GCS Resumable Upload

     [ https://issues.apache.org/jira/browse/BEAM-9078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Luke Cwik updated BEAM-9078:
----------------------------
    Status: Open  (was: Triage Needed)

> Large Tarball Artifacts Should Use GCS Resumable Upload
> -------------------------------------------------------
>
>                 Key: BEAM-9078
>                 URL: https://issues.apache.org/jira/browse/BEAM-9078
>             Project: Beam
>          Issue Type: Bug
>          Components: runner-dataflow
>    Affects Versions: 2.17.0
>            Reporter: Brad West
>            Assignee: Brad West
>            Priority: Major
>             Fix For: 2.19.0
>
>   Original Estimate: 1h
>          Time Spent: 40m
>  Remaining Estimate: 20m
>
> It's possible for the tarball uploaded to GCS to be quite large. An example is a user vendoring multiple dependencies in their tarball so as to achieve a more stable deployable artifact.
> Before this change the GCS upload api call executed a multipart upload, which Google [documentation]([https://cloud.google.com/storage/docs/json_api/v1/how-tos/upload)] states should be used when the file is small enough to upload again when the connection fails. For large tarballs, we will hit 60 second socket timeouts before completing the multipart upload. By passing `total_size`, apitools first checks if the size exceeds the resumable upload threshold, and executes the more robust resumable upload rather than a multipart, avoiding
>  socket timeouts.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)