You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chamikara Madhusanka Jayalath (Jira)" <ji...@apache.org> on 2022/05/05 17:45:00 UTC

[jira] [Commented] (BEAM-14393) Obtain metadata field at once in file system's IO connectors

    [ https://issues.apache.org/jira/browse/BEAM-14393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17532453#comment-17532453 ] 

Chamikara Madhusanka Jayalath commented on BEAM-14393:
------------------------------------------------------

Thanks. Is this optimization something we can do before Beam 3.0.0 ? We don't have to wait for the next major version unless this somehow breaks one of the major public APIs (for example, FileSystem API). Doesn't sound like that's the case.

> Obtain metadata field at once in file system's IO connectors
> ------------------------------------------------------------
>
>                 Key: BEAM-14393
>                 URL: https://issues.apache.org/jira/browse/BEAM-14393
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-common
>            Reporter: Yi Hu
>            Priority: P2
>             Fix For: 3.0.0
>
>
> This tasks involves refactoring and improvements of IO connectors' file metadata related methods (GcsIO, S3IO, BlobIO, hadoop).
> Currently, we have individual methods like size, last_updated, checksum, and others. Each one would make a HTTP request in order to get the specific metadata field. If one needs to gather multiple metadata fields, then every specific method are called and making multiple requests under the hood. Actually, the HTTP response contains multiple file metadata fields but each time only one field is collected and others are discarded.
> We should have a public method that returns a named tuple which contains multiple file metadata fields. In its implementation it only makes one request, as existing methods for single metadata field. 
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)