You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/03/09 09:21:00 UTC

[jira] [Commented] (ARROW-15875) [R][C++] Include md5sum in S3 method for GetFileInfo()

    [ https://issues.apache.org/jira/browse/ARROW-15875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17503426#comment-17503426 ] 

Antoine Pitrou commented on ARROW-15875:
----------------------------------------

S3 doesn't always give you the MD5 checksum, it gives you an opaque ETag:
https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html

The ETag can be fetched from the object metadata using the [C++ stream API|https://arrow.apache.org/docs/cpp/api/io.html#_CPPv4N5arrow2io11InputStream12ReadMetadataEv], but it doesn't seem wired in R. [~thisisnic] [~paleolimbot]

> [R][C++] Include md5sum in S3 method for GetFileInfo()
> ------------------------------------------------------
>
>                 Key: ARROW-15875
>                 URL: https://issues.apache.org/jira/browse/ARROW-15875
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>    Affects Versions: 7.0.0
>            Reporter: Carl Boettiger
>            Priority: Major
>
> GetFileInfo() seems to include mtime, size, path and type.  For an S3 system, it would be nice to be able to reference the md5 sum without transferring the file, (which I think the server will have already computed?).  This seems like the logical place to include it (though I wouldn't object to a more visible method too).  
>  
>  
> (though type isn't clear to me, since it appears to be an integer)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)