You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/22 12:56:00 UTC

[GitHub] [arrow] nealrichardson commented on pull request #13625: ARROW-16612: [R] Support inferring compression from filename for all readers/writers

nealrichardson commented on PR #13625:
URL: https://github.com/apache/arrow/pull/13625#issuecomment-1192545063

   > after this PR we get a file with a .gz extension that is not gzipped
   
   The file isn't gzipped but gzip compression is used internally in compressing the Parquet file contents. I agree that that is odd, but it is consistent with my understanding of how compression filename extensions are used with Parquet customarily.
   
   Weirdness aside, the bigger issue IMO is that this PR fixes at least 3 bugs where the current code fails. On master, `write_parquet("XXXX.gz")` writes a Parquet file and then compresses it with gzip around it, but then `read_parquet("XXXX.gz")` can't read it. Moreover, `write_parquet("XXXX.zst")` would also write a gzipped file, and `write_parquet("XXXX.snappy")` wouldn't compress at all. If we think that write_parquet shouldn't infer compression from the filename at all, that's fine, we can make that change on top of this PR, but we should move forward with the rest of the changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org