You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/18 17:53:26 UTC

[GitHub] [arrow] boshek commented on a diff in pull request #13183: ARROW-16144: [R] Write compressed data streams (particularly over S3)

boshek commented on code in PR #13183:
URL: https://github.com/apache/arrow/pull/13183#discussion_r876183135


##########
r/R/io.R:
##########
@@ -292,7 +292,7 @@ make_readable_file <- function(file, mmap = TRUE, compression = NULL, filesystem
   file
 }
 
-make_output_stream <- function(x, filesystem = NULL) {
+make_output_stream <- function(x, filesystem = NULL, compression = NULL) {

Review Comment:
   So for the `parquet.snappy` or even `snappy.parquet` I think it works because "snappy" isn't included here: 
   https://github.com/apache/arrow/blob/3df2e0568240d6b629c0a3163df21a1a2a160810/r/R/io.R#L325-L330
   
   But if someone tried something like this we do get an error that isn't super informative. I _think_ this is outside this PR so could the resolution here be to open another ticket for this specifically? 
   ``` r
   library(arrow, warn.conflicts = FALSE)
   tf <- tempfile(fileext = ".parquet.gz")
   write_parquet(data.frame(x = 1:5), tf, compression = "gzip", compression_level = 5)
   read_parquet(tf)
   #> Error: file must be a "RandomAccessFile"
   ```



##########
r/tests/testthat/test-s3-minio.R:
##########
@@ -54,6 +54,24 @@ if (arrow_with_s3() && process_is_running("minio server")) {
     )
   })
 
+  test_that("read/write compressed csv by filesystem", {
+    dat <- tibble(x = seq(1, 10, by = 0.2))

Review Comment:
   I will add in to be defensive. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org