You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Lasse Karls (Jira)" <ji...@apache.org> on 2022/03/24 10:50:00 UTC

[jira] [Created] (BEAM-14165) Specify GCS Object Version in apache_beam.io.gcp.gcsio

Lasse Karls created BEAM-14165:
----------------------------------

             Summary: Specify GCS Object Version in apache_beam.io.gcp.gcsio
                 Key: BEAM-14165
                 URL: https://issues.apache.org/jira/browse/BEAM-14165
             Project: Beam
          Issue Type: Improvement
          Components: io-py-gcp
    Affects Versions: 2.37.0
            Reporter: Lasse Karls


I would like to specify a generation when accessing a gcs object via the beam filesystem.
Via the cli with the gsutil command a specific version can be access by the following syntax. 

{code:sh}
gsutil cp gs://{bucket}/{object_path}#{generation} .
{code}

So the corresponding python code would look something like this
{code:python}
with apache_beam.io.filesystems.open("gs://{bucket}/{object_path}#{generation}") as f:
pass
{code}

Fortunately, the [StorageObjectsGetRequest|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/internal/clients/storage/storage_v1_messages.py#L2133] can already be passed a generation. 
However, this is +*not done*+ within the [GcsDownloader|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L611]. 

I think when [parsing the GCS path|https://github.com/apache/beam/blob/14862ccbdf2879574b6ce49149bdd7c9bf197322/sdks/python/apache_beam/io/gcp/gcsio.py#L583] the generation should be extracted as well. 







--
This message was sent by Atlassian Jira
(v8.20.1#820001)