You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Chris Berthiaume <ch...@uw.edu> on 2021/10/31 23:04:24 UTC

[R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Hello,

After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
container, I started to see some new errors when reading Parquet files that
use snappy compression. I'm using the prebuilt Linux binary by setting
LIBARROW_BINARY=true during installation. Building arrow using the latest
nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
binary does not have snappy compression support enabled? The error is
copied below.

Error: NotImplemented: Support for codec 'snappy' not built
In order to read this file, you will need to reinstall arrow with
additional features enabled.
Set one of these environment variables before installing:

 * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
 * ARROW_WITH_SNAPPY=ON (for just 'snappy')

See https://arrow.apache.org/docs/r/articles/install.html for details
Backtrace:
 1. popcycle::get.vct.by.file(db, vct_dir,
"2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
 4. arrow::read_parquet(...)
 5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
 6. base:::tryCatchList(expr, classes, parentenv, handlers)
 7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
 8. value[[3L]](cond)

Thanks,
Chris Berthiaume

Re: [R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Posted by Jonathan Keane <jk...@gmail.com>.
Following up on this, RSPM has updated their binaries to include
Snappy by default. Let us know if you have any continuing issues with
this and thank you for the report!

-Jon

On Mon, Nov 1, 2021 at 5:45 PM Chris Berthiaume <ch...@uw.edu> wrote:
>
> Thanks for the explanation. I'll keep my eye out for a new binary soon.
>
> On Mon, Nov 1, 2021 at 1:23 PM Neal Richardson <ne...@gmail.com> wrote:
>>
>> Thanks for the details. I see you're using RStudio Package Manager. There was an issue with the binaries that RSPM built for 6.0.0.2, we've been discussing with them and they should be fixing it on their side, so this should resolve itself soon (if it isn't already resolved).
>>
>> Neal
>>
>>
>> On Mon, Nov 1, 2021 at 1:36 PM Chris Berthiaume <ch...@uw.edu> wrote:
>>>
>>> Hi Neal,
>>>
>>> Here's a reproducible example using a fresh Docker container for bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start R, install arrow, attach arrow, then try to read a simple parquet file I just now created separately in Rstudio on MacOS with arrow 5.0.0. This fails. I stop/start R again, install arrow 5.0.0.2 with devtools::install_version(), attach, then verify that I can successfully read the same parquet file.
>>>
>>> I've changed the R prompt character below from ">" to "$" to prevent any text from being interpreted as an email reply.
>>>
>>> # Creating the parquet file in Rstudio in MacOS
>>> $ x <- data.frame(A=seq(0, 2), B=seq(10,12))
>>> $ x
>>>   A  B
>>> 1 0 10
>>> 2 1 11
>>> 3 2 12
>>> $ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")
>>>
>>> # Run the test in a docker container
>>> docker run -it --rm -v ~/Desktop/arrowtest:/data bioconductor/bioconductor_docker:RELEASE_3_13 bash
>>> root@5fa84c3f4a41:/# cd /data
>>> root@5fa84c3f4a41:/data# R
>>>
>>> R version 4.1.1 (2021-08-10) -- "Kick Things"
>>> Copyright (C) 2021 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and
>>> 'citation()' on how to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>> $ install.packages('arrow')
>>> Installing package into ‘/usr/local/lib/R/site-library’
>>> (as ‘lib’ is unspecified)
>>> also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’
>>>
>>> trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz'
>>> Content type 'binary/octet-stream' length 691644 bytes (675 KB)
>>> ==================================================
>>> downloaded 675 KB
>>>
>>> trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz'
>>> Content type 'binary/octet-stream' length 52329 bytes (51 KB)
>>> ==================================================
>>> downloaded 51 KB
>>>
>>> trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz'
>>> Content type 'binary/octet-stream' length 573106 bytes (559 KB)
>>> ==================================================
>>> downloaded 559 KB
>>>
>>> trying URL 'https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz'
>>> Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
>>> ==================================================
>>> downloaded 22.6 MB
>>>
>>> * installing *binary* package ‘bit’ ...
>>> * DONE (bit)
>>> * installing *binary* package ‘assertthat’ ...
>>> * DONE (assertthat)
>>> * installing *binary* package ‘bit64’ ...
>>> * DONE (bit64)
>>> * installing *binary* package ‘arrow’ ...
>>> * DONE (arrow)
>>>
>>> The downloaded source packages are in
>>> ‘/tmp/Rtmp8HkDvX/downloaded_packages’
>>> $ library(arrow)
>>> See arrow_info() for available features
>>>
>>> Attaching package: ‘arrow’
>>>
>>> The following object is masked from ‘package:utils’:
>>>
>>>     timestamp
>>>
>>> $ sessionInfo()
>>> R version 4.1.1 (2021-08-10)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 20.04.3 LTS
>>>
>>> Matrix products: default
>>> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] arrow_6.0.0.2
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.1   magrittr_2.0.1
>>>  [5] assertthat_0.2.1 R6_2.5.1         tools_4.1.1      glue_1.4.2
>>>  [9] bit64_4.0.5      vctrs_0.3.8      rlang_0.4.11     purrr_0.3.4
>>> $ read_parquet("x.parquet")
>>> Error: NotImplemented: Support for codec 'snappy' not built
>>> In order to read this file, you will need to reinstall arrow with additional features enabled.
>>> Set one of these environment variables before installing:
>>>
>>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>>
>>> See https://arrow.apache.org/docs/r/articles/install.html for details
>>>
>>> root@5fa84c3f4a41:/data# R
>>>
>>> R version 4.1.1 (2021-08-10) -- "Kick Things"
>>> Copyright (C) 2021 The R Foundation for Statistical Computing
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>>
>>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>>> You are welcome to redistribute it under certain conditions.
>>> Type 'license()' or 'licence()' for distribution details.
>>>
>>> R is a collaborative project with many contributors.
>>> Type 'contributors()' for more information and
>>> 'citation()' on how to cite R or R packages in publications.
>>>
>>> Type 'demo()' for some demos, 'help()' for on-line help, or
>>> 'help.start()' for an HTML browser interface to help.
>>> Type 'q()' to quit R.
>>>
>>> $ devtools::install_version("arrow", "5.0.0.2")
>>> Downloading package from url: https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
>>> These packages have more recent versions available.
>>> It is recommended to update all of them.
>>> Which would you like to update?
>>>
>>> 1: All
>>> 2: CRAN packages only
>>> 3: None
>>> 4: rlang (0.4.11 -> 0.4.12) [CRAN]
>>>
>>> Enter one or more numbers, or an empty line to skip updates:
>>> Installing package into ‘/usr/local/lib/R/site-library’
>>> (as ‘lib’ is unspecified)
>>> * installing *binary* package ‘arrow’ ...
>>> * DONE (arrow)
>>> $ library(arrow)
>>>
>>> Attaching package: ‘arrow’
>>>
>>> The following object is masked from ‘package:utils’:
>>>
>>>     timestamp
>>>
>>> $ sessionInfo()
>>> R version 4.1.1 (2021-08-10)
>>> Platform: x86_64-pc-linux-gnu (64-bit)
>>> Running under: Ubuntu 20.04.3 LTS
>>>
>>> Matrix products: default
>>> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
>>>
>>> locale:
>>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>>
>>> attached base packages:
>>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>>
>>> other attached packages:
>>> [1] arrow_5.0.0.2
>>>
>>> loaded via a namespace (and not attached):
>>>  [1] magrittr_2.0.1    usethis_2.0.1     devtools_2.4.2    tidyselect_1.1.1
>>>  [5] bit_4.0.4         pkgload_1.2.2     R6_2.5.1          rlang_0.4.11
>>>  [9] fastmap_1.1.0     tools_4.1.1       pkgbuild_1.2.0    sessioninfo_1.1.1
>>> [13] cli_3.0.1         withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0
>>> [17] bit64_4.0.5       rprojroot_2.0.2   assertthat_0.2.1  lifecycle_1.0.1
>>> [21] crayon_1.4.1      processx_3.5.2    purrr_0.3.4       callr_3.7.0
>>> [25] vctrs_0.3.8       fs_1.5.0          ps_1.6.0          testthat_3.0.4
>>> [29] memoise_2.0.0     glue_1.4.2        cachem_1.0.6      compiler_4.1.1
>>> [33] desc_1.3.0        prettyunits_1.1.1
>>> $ read_parquet("x.parquet")
>>>   A  B
>>> 1 0 10
>>> 2 1 11
>>> 3 2 12
>>>
>>> On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <ne...@gmail.com> wrote:
>>>>
>>>> Hi Chris,
>>>> Could you share the output from when you installed the package? Snappy and the other compression libraries should be on in the binaries (see https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625 for example), so I'm curious if there's anything in the install logs that help us understand what's up.
>>>>
>>>> Neal
>>>>
>>>> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <ch...@uw.edu> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker container, I started to see some new errors when reading Parquet files that use snappy compression. I'm using the prebuilt Linux binary by setting LIBARROW_BINARY=true during installation. Building arrow using the latest nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux binary does not have snappy compression support enabled? The error is copied below.
>>>>>
>>>>> Error: NotImplemented: Support for codec 'snappy' not built
>>>>> In order to read this file, you will need to reinstall arrow with additional features enabled.
>>>>> Set one of these environment variables before installing:
>>>>>
>>>>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>>>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>>>>
>>>>> See https://arrow.apache.org/docs/r/articles/install.html for details
>>>>> Backtrace:
>>>>>  1. popcycle::get.vct.by.file(db, vct_dir, "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>>>>>  4. arrow::read_parquet(...)
>>>>>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>>>>>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>>>>>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>>>>  8. value[[3L]](cond)
>>>>>
>>>>> Thanks,
>>>>> Chris Berthiaume

Re: [R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Posted by Chris Berthiaume <ch...@uw.edu>.
Thanks for the explanation. I'll keep my eye out for a new binary soon.

On Mon, Nov 1, 2021 at 1:23 PM Neal Richardson <ne...@gmail.com>
wrote:

> Thanks for the details. I see you're using RStudio Package Manager. There
> was an issue with the binaries that RSPM built for 6.0.0.2, we've been
> discussing with them and they should be fixing it on their side, so this
> should resolve itself soon (if it isn't already resolved).
>
> Neal
>
>
> On Mon, Nov 1, 2021 at 1:36 PM Chris Berthiaume <ch...@uw.edu> wrote:
>
>> Hi Neal,
>>
>> Here's a reproducible example using a fresh Docker container for
>> bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start
>> R, install arrow, attach arrow, then try to read a simple parquet file I
>> just now created separately in Rstudio on MacOS with arrow 5.0.0. This
>> fails. I stop/start R again, install arrow 5.0.0.2 with
>> devtools::install_version(), attach, then verify that I can successfully
>> read the same parquet file.
>>
>> I've changed the R prompt character below from ">" to "$" to prevent any
>> text from being interpreted as an email reply.
>>
>> # Creating the parquet file in Rstudio in MacOS
>> $ x <- data.frame(A=seq(0, 2), B=seq(10,12))
>> $ x
>>   A  B
>> 1 0 10
>> 2 1 11
>> 3 2 12
>> $ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")
>>
>> # Run the test in a docker container
>> docker run -it --rm -v ~/Desktop/arrowtest:/data
>> bioconductor/bioconductor_docker:RELEASE_3_13 bash
>> root@5fa84c3f4a41:/# cd /data
>> root@5fa84c3f4a41:/data# R
>>
>> R version 4.1.1 (2021-08-10) -- "Kick Things"
>> Copyright (C) 2021 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>> $ install.packages('arrow')
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’
>>
>> trying URL '
>> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz
>> '
>> Content type 'binary/octet-stream' length 691644 bytes (675 KB)
>> ==================================================
>> downloaded 675 KB
>>
>> trying URL '
>> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz
>> '
>> Content type 'binary/octet-stream' length 52329 bytes (51 KB)
>> ==================================================
>> downloaded 51 KB
>>
>> trying URL '
>> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz
>> '
>> Content type 'binary/octet-stream' length 573106 bytes (559 KB)
>> ==================================================
>> downloaded 559 KB
>>
>> trying URL '
>> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz
>> '
>> Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
>> ==================================================
>> downloaded 22.6 MB
>>
>> * installing *binary* package ‘bit’ ...
>> * DONE (bit)
>> * installing *binary* package ‘assertthat’ ...
>> * DONE (assertthat)
>> * installing *binary* package ‘bit64’ ...
>> * DONE (bit64)
>> * installing *binary* package ‘arrow’ ...
>> * DONE (arrow)
>>
>> The downloaded source packages are in
>> ‘/tmp/Rtmp8HkDvX/downloaded_packages’
>> $ library(arrow)
>> See arrow_info() for available features
>>
>> Attaching package: ‘arrow’
>>
>> The following object is masked from ‘package:utils’:
>>
>>     timestamp
>>
>> $ sessionInfo()
>> R version 4.1.1 (2021-08-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 20.04.3 LTS
>>
>> Matrix products: default
>> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
>> libopenblasp-r0.3.8.so
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] arrow_6.0.0.2
>>
>> loaded via a namespace (and not attached):
>>  [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.1   magrittr_2.0.1
>>  [5] assertthat_0.2.1 R6_2.5.1         tools_4.1.1      glue_1.4.2
>>  [9] bit64_4.0.5      vctrs_0.3.8      rlang_0.4.11     purrr_0.3.4
>> $ read_parquet("x.parquet")
>> Error: NotImplemented: Support for codec 'snappy' not built
>> In order to read this file, you will need to reinstall arrow with
>> additional features enabled.
>> Set one of these environment variables before installing:
>>
>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>
>> See https://arrow.apache.org/docs/r/articles/install.html for details
>>
>> root@5fa84c3f4a41:/data# R
>>
>> R version 4.1.1 (2021-08-10) -- "Kick Things"
>> Copyright (C) 2021 The R Foundation for Statistical Computing
>> Platform: x86_64-pc-linux-gnu (64-bit)
>>
>> R is free software and comes with ABSOLUTELY NO WARRANTY.
>> You are welcome to redistribute it under certain conditions.
>> Type 'license()' or 'licence()' for distribution details.
>>
>> R is a collaborative project with many contributors.
>> Type 'contributors()' for more information and
>> 'citation()' on how to cite R or R packages in publications.
>>
>> Type 'demo()' for some demos, 'help()' for on-line help, or
>> 'help.start()' for an HTML browser interface to help.
>> Type 'q()' to quit R.
>>
>> $ devtools::install_version("arrow", "5.0.0.2")
>> Downloading package from url:
>> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
>> These packages have more recent versions available.
>> It is recommended to update all of them.
>> Which would you like to update?
>>
>> 1: All
>> 2: CRAN packages only
>> 3: None
>> 4: rlang (0.4.11 -> 0.4.12) [CRAN]
>>
>> Enter one or more numbers, or an empty line to skip updates:
>> Installing package into ‘/usr/local/lib/R/site-library’
>> (as ‘lib’ is unspecified)
>> * installing *binary* package ‘arrow’ ...
>> * DONE (arrow)
>> $ library(arrow)
>>
>> Attaching package: ‘arrow’
>>
>> The following object is masked from ‘package:utils’:
>>
>>     timestamp
>>
>> $ sessionInfo()
>> R version 4.1.1 (2021-08-10)
>> Platform: x86_64-pc-linux-gnu (64-bit)
>> Running under: Ubuntu 20.04.3 LTS
>>
>> Matrix products: default
>> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
>> libopenblasp-r0.3.8.so
>>
>> locale:
>>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
>> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>>
>> attached base packages:
>> [1] stats     graphics  grDevices utils     datasets  methods   base
>>
>> other attached packages:
>> [1] arrow_5.0.0.2
>>
>> loaded via a namespace (and not attached):
>>  [1] magrittr_2.0.1    usethis_2.0.1     devtools_2.4.2
>>  tidyselect_1.1.1
>>  [5] bit_4.0.4         pkgload_1.2.2     R6_2.5.1          rlang_0.4.11
>>
>>  [9] fastmap_1.1.0     tools_4.1.1       pkgbuild_1.2.0
>>  sessioninfo_1.1.1
>> [13] cli_3.0.1         withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0
>>
>> [17] bit64_4.0.5       rprojroot_2.0.2   assertthat_0.2.1
>>  lifecycle_1.0.1
>> [21] crayon_1.4.1      processx_3.5.2    purrr_0.3.4       callr_3.7.0
>>
>> [25] vctrs_0.3.8       fs_1.5.0          ps_1.6.0          testthat_3.0.4
>>
>> [29] memoise_2.0.0     glue_1.4.2        cachem_1.0.6      compiler_4.1.1
>>
>> [33] desc_1.3.0        prettyunits_1.1.1
>> $ read_parquet("x.parquet")
>>   A  B
>> 1 0 10
>> 2 1 11
>> 3 2 12
>>
>> On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <
>> neal.p.richardson@gmail.com> wrote:
>>
>>> Hi Chris,
>>> Could you share the output from when you installed the package? Snappy
>>> and the other compression libraries should be on in the binaries (see
>>> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
>>> for example), so I'm curious if there's anything in the install logs that
>>> help us understand what's up.
>>>
>>> Neal
>>>
>>> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <ch...@uw.edu>
>>> wrote:
>>>
>>>> Hello,
>>>>
>>>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
>>>> container, I started to see some new errors when reading Parquet files that
>>>> use snappy compression. I'm using the prebuilt Linux binary by setting
>>>> LIBARROW_BINARY=true during installation. Building arrow using the latest
>>>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
>>>> binary does not have snappy compression support enabled? The error is
>>>> copied below.
>>>>
>>>> Error: NotImplemented: Support for codec 'snappy' not built
>>>> In order to read this file, you will need to reinstall arrow with
>>>> additional features enabled.
>>>> Set one of these environment variables before installing:
>>>>
>>>>  * LIBARROW_MINIMAL=false (for all optional features, including
>>>> 'snappy')
>>>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>>>
>>>> See https://arrow.apache.org/docs/r/articles/install.html for details
>>>> Backtrace:
>>>>  1. popcycle::get.vct.by.file(db, vct_dir,
>>>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>>>>  4. arrow::read_parquet(...)
>>>>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>>>>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>>>>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>>>  8. value[[3L]](cond)
>>>>
>>>> Thanks,
>>>> Chris Berthiaume
>>>>
>>>

Re: [R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Posted by Neal Richardson <ne...@gmail.com>.
Thanks for the details. I see you're using RStudio Package Manager. There
was an issue with the binaries that RSPM built for 6.0.0.2, we've been
discussing with them and they should be fixing it on their side, so this
should resolve itself soon (if it isn't already resolved).

Neal


On Mon, Nov 1, 2021 at 1:36 PM Chris Berthiaume <ch...@uw.edu> wrote:

> Hi Neal,
>
> Here's a reproducible example using a fresh Docker container for
> bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start
> R, install arrow, attach arrow, then try to read a simple parquet file I
> just now created separately in Rstudio on MacOS with arrow 5.0.0. This
> fails. I stop/start R again, install arrow 5.0.0.2 with
> devtools::install_version(), attach, then verify that I can successfully
> read the same parquet file.
>
> I've changed the R prompt character below from ">" to "$" to prevent any
> text from being interpreted as an email reply.
>
> # Creating the parquet file in Rstudio in MacOS
> $ x <- data.frame(A=seq(0, 2), B=seq(10,12))
> $ x
>   A  B
> 1 0 10
> 2 1 11
> 3 2 12
> $ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")
>
> # Run the test in a docker container
> docker run -it --rm -v ~/Desktop/arrowtest:/data
> bioconductor/bioconductor_docker:RELEASE_3_13 bash
> root@5fa84c3f4a41:/# cd /data
> root@5fa84c3f4a41:/data# R
>
> R version 4.1.1 (2021-08-10) -- "Kick Things"
> Copyright (C) 2021 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> $ install.packages('arrow')
> Installing package into ‘/usr/local/lib/R/site-library’
> (as ‘lib’ is unspecified)
> also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’
>
> trying URL '
> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz
> '
> Content type 'binary/octet-stream' length 691644 bytes (675 KB)
> ==================================================
> downloaded 675 KB
>
> trying URL '
> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz
> '
> Content type 'binary/octet-stream' length 52329 bytes (51 KB)
> ==================================================
> downloaded 51 KB
>
> trying URL '
> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz
> '
> Content type 'binary/octet-stream' length 573106 bytes (559 KB)
> ==================================================
> downloaded 559 KB
>
> trying URL '
> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz
> '
> Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
> ==================================================
> downloaded 22.6 MB
>
> * installing *binary* package ‘bit’ ...
> * DONE (bit)
> * installing *binary* package ‘assertthat’ ...
> * DONE (assertthat)
> * installing *binary* package ‘bit64’ ...
> * DONE (bit64)
> * installing *binary* package ‘arrow’ ...
> * DONE (arrow)
>
> The downloaded source packages are in
> ‘/tmp/Rtmp8HkDvX/downloaded_packages’
> $ library(arrow)
> See arrow_info() for available features
>
> Attaching package: ‘arrow’
>
> The following object is masked from ‘package:utils’:
>
>     timestamp
>
> $ sessionInfo()
> R version 4.1.1 (2021-08-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.3 LTS
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
> libopenblasp-r0.3.8.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] arrow_6.0.0.2
>
> loaded via a namespace (and not attached):
>  [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.1   magrittr_2.0.1
>  [5] assertthat_0.2.1 R6_2.5.1         tools_4.1.1      glue_1.4.2
>  [9] bit64_4.0.5      vctrs_0.3.8      rlang_0.4.11     purrr_0.3.4
> $ read_parquet("x.parquet")
> Error: NotImplemented: Support for codec 'snappy' not built
> In order to read this file, you will need to reinstall arrow with
> additional features enabled.
> Set one of these environment variables before installing:
>
>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>
> See https://arrow.apache.org/docs/r/articles/install.html for details
>
> root@5fa84c3f4a41:/data# R
>
> R version 4.1.1 (2021-08-10) -- "Kick Things"
> Copyright (C) 2021 The R Foundation for Statistical Computing
> Platform: x86_64-pc-linux-gnu (64-bit)
>
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
>
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
>
> Type 'demo()' for some demos, 'help()' for on-line help, or
> 'help.start()' for an HTML browser interface to help.
> Type 'q()' to quit R.
>
> $ devtools::install_version("arrow", "5.0.0.2")
> Downloading package from url:
> https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
> These packages have more recent versions available.
> It is recommended to update all of them.
> Which would you like to update?
>
> 1: All
> 2: CRAN packages only
> 3: None
> 4: rlang (0.4.11 -> 0.4.12) [CRAN]
>
> Enter one or more numbers, or an empty line to skip updates:
> Installing package into ‘/usr/local/lib/R/site-library’
> (as ‘lib’ is unspecified)
> * installing *binary* package ‘arrow’ ...
> * DONE (arrow)
> $ library(arrow)
>
> Attaching package: ‘arrow’
>
> The following object is masked from ‘package:utils’:
>
>     timestamp
>
> $ sessionInfo()
> R version 4.1.1 (2021-08-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.3 LTS
>
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
> libopenblasp-r0.3.8.so
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] arrow_5.0.0.2
>
> loaded via a namespace (and not attached):
>  [1] magrittr_2.0.1    usethis_2.0.1     devtools_2.4.2
>  tidyselect_1.1.1
>  [5] bit_4.0.4         pkgload_1.2.2     R6_2.5.1          rlang_0.4.11
>
>  [9] fastmap_1.1.0     tools_4.1.1       pkgbuild_1.2.0
>  sessioninfo_1.1.1
> [13] cli_3.0.1         withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0
>
> [17] bit64_4.0.5       rprojroot_2.0.2   assertthat_0.2.1  lifecycle_1.0.1
>
> [21] crayon_1.4.1      processx_3.5.2    purrr_0.3.4       callr_3.7.0
>
> [25] vctrs_0.3.8       fs_1.5.0          ps_1.6.0          testthat_3.0.4
>
> [29] memoise_2.0.0     glue_1.4.2        cachem_1.0.6      compiler_4.1.1
>
> [33] desc_1.3.0        prettyunits_1.1.1
> $ read_parquet("x.parquet")
>   A  B
> 1 0 10
> 2 1 11
> 3 2 12
>
> On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <
> neal.p.richardson@gmail.com> wrote:
>
>> Hi Chris,
>> Could you share the output from when you installed the package? Snappy
>> and the other compression libraries should be on in the binaries (see
>> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
>> for example), so I'm curious if there's anything in the install logs that
>> help us understand what's up.
>>
>> Neal
>>
>> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <ch...@uw.edu> wrote:
>>
>>> Hello,
>>>
>>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
>>> container, I started to see some new errors when reading Parquet files that
>>> use snappy compression. I'm using the prebuilt Linux binary by setting
>>> LIBARROW_BINARY=true during installation. Building arrow using the latest
>>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
>>> binary does not have snappy compression support enabled? The error is
>>> copied below.
>>>
>>> Error: NotImplemented: Support for codec 'snappy' not built
>>> In order to read this file, you will need to reinstall arrow with
>>> additional features enabled.
>>> Set one of these environment variables before installing:
>>>
>>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>>
>>> See https://arrow.apache.org/docs/r/articles/install.html for details
>>> Backtrace:
>>>  1. popcycle::get.vct.by.file(db, vct_dir,
>>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>>>  4. arrow::read_parquet(...)
>>>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>>>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>>>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>>  8. value[[3L]](cond)
>>>
>>> Thanks,
>>> Chris Berthiaume
>>>
>>

Re: [R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Posted by Chris Berthiaume <ch...@uw.edu>.
Hi Neal,

Here's a reproducible example using a fresh Docker container for
bioconductor/bioconductor_docker:RELEASE_3_13. I start the container, start
R, install arrow, attach arrow, then try to read a simple parquet file I
just now created separately in Rstudio on MacOS with arrow 5.0.0. This
fails. I stop/start R again, install arrow 5.0.0.2 with
devtools::install_version(), attach, then verify that I can successfully
read the same parquet file.

I've changed the R prompt character below from ">" to "$" to prevent any
text from being interpreted as an email reply.

# Creating the parquet file in Rstudio in MacOS
$ x <- data.frame(A=seq(0, 2), B=seq(10,12))
$ x
  A  B
1 0 10
2 1 11
3 2 12
$ arrow::write_parquet(x, "~/Desktop/arrowtest/x.parquet")

# Run the test in a docker container
docker run -it --rm -v ~/Desktop/arrowtest:/data
bioconductor/bioconductor_docker:RELEASE_3_13 bash
root@5fa84c3f4a41:/# cd /data
root@5fa84c3f4a41:/data# R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

$ install.packages('arrow')
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘bit’, ‘assertthat’, ‘bit64’

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit_4.0.4.tar.gz
'
Content type 'binary/octet-stream' length 691644 bytes (675 KB)
==================================================
downloaded 675 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/assertthat_0.2.1.tar.gz
'
Content type 'binary/octet-stream' length 52329 bytes (51 KB)
==================================================
downloaded 51 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/bit64_4.0.5.tar.gz
'
Content type 'binary/octet-stream' length 573106 bytes (559 KB)
==================================================
downloaded 559 KB

trying URL '
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/arrow_6.0.0.2.tar.gz
'
Content type 'binary/octet-stream' length 23646684 bytes (22.6 MB)
==================================================
downloaded 22.6 MB

* installing *binary* package ‘bit’ ...
* DONE (bit)
* installing *binary* package ‘assertthat’ ...
* DONE (assertthat)
* installing *binary* package ‘bit64’ ...
* DONE (bit64)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)

The downloaded source packages are in
‘/tmp/Rtmp8HkDvX/downloaded_packages’
$ library(arrow)
See arrow_info() for available features

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] arrow_6.0.0.2

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.1 bit_4.0.4        compiler_4.1.1   magrittr_2.0.1
 [5] assertthat_0.2.1 R6_2.5.1         tools_4.1.1      glue_1.4.2
 [9] bit64_4.0.5      vctrs_0.3.8      rlang_0.4.11     purrr_0.3.4
$ read_parquet("x.parquet")
Error: NotImplemented: Support for codec 'snappy' not built
In order to read this file, you will need to reinstall arrow with
additional features enabled.
Set one of these environment variables before installing:

 * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
 * ARROW_WITH_SNAPPY=ON (for just 'snappy')

See https://arrow.apache.org/docs/r/articles/install.html for details

root@5fa84c3f4a41:/data# R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

$ devtools::install_version("arrow", "5.0.0.2")
Downloading package from url:
https://packagemanager.rstudio.com/all/__linux__/focal/latest/src/contrib/Archive/arrow/arrow_5.0.0.2.tar.gz
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?

1: All
2: CRAN packages only
3: None
4: rlang (0.4.11 -> 0.4.12) [CRAN]

Enter one or more numbers, or an empty line to skip updates:
Installing package into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
* installing *binary* package ‘arrow’ ...
* DONE (arrow)
$ library(arrow)

Attaching package: ‘arrow’

The following object is masked from ‘package:utils’:

    timestamp

$ sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/
libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] arrow_5.0.0.2

loaded via a namespace (and not attached):
 [1] magrittr_2.0.1    usethis_2.0.1     devtools_2.4.2    tidyselect_1.1.1
 [5] bit_4.0.4         pkgload_1.2.2     R6_2.5.1          rlang_0.4.11
 [9] fastmap_1.1.0     tools_4.1.1       pkgbuild_1.2.0    sessioninfo_1.1.1
[13] cli_3.0.1         withr_2.4.2       ellipsis_0.3.2    remotes_2.4.0
[17] bit64_4.0.5       rprojroot_2.0.2   assertthat_0.2.1  lifecycle_1.0.1
[21] crayon_1.4.1      processx_3.5.2    purrr_0.3.4       callr_3.7.0
[25] vctrs_0.3.8       fs_1.5.0          ps_1.6.0          testthat_3.0.4
[29] memoise_2.0.0     glue_1.4.2        cachem_1.0.6      compiler_4.1.1
[33] desc_1.3.0        prettyunits_1.1.1
$ read_parquet("x.parquet")
  A  B
1 0 10
2 1 11
3 2 12

On Mon, Nov 1, 2021 at 7:05 AM Neal Richardson <ne...@gmail.com>
wrote:

> Hi Chris,
> Could you share the output from when you installed the package? Snappy and
> the other compression libraries should be on in the binaries (see
> https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
> for example), so I'm curious if there's anything in the install logs that
> help us understand what's up.
>
> Neal
>
> On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <ch...@uw.edu> wrote:
>
>> Hello,
>>
>> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
>> container, I started to see some new errors when reading Parquet files that
>> use snappy compression. I'm using the prebuilt Linux binary by setting
>> LIBARROW_BINARY=true during installation. Building arrow using the latest
>> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
>> binary does not have snappy compression support enabled? The error is
>> copied below.
>>
>> Error: NotImplemented: Support for codec 'snappy' not built
>> In order to read this file, you will need to reinstall arrow with
>> additional features enabled.
>> Set one of these environment variables before installing:
>>
>>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>>
>> See https://arrow.apache.org/docs/r/articles/install.html for details
>> Backtrace:
>>  1. popcycle::get.vct.by.file(db, vct_dir,
>> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>>  4. arrow::read_parquet(...)
>>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>>  8. value[[3L]](cond)
>>
>> Thanks,
>> Chris Berthiaume
>>
>

Re: [R] Snappy compression not enabled in prebuilt Linux binary 6.0.0.2?

Posted by Neal Richardson <ne...@gmail.com>.
Hi Chris,
Could you share the output from when you installed the package? Snappy and
the other compression libraries should be on in the binaries (see
https://github.com/ursa-labs/arrow-r-nightly/runs/4052316735?check_suite_focus=true#step:4:625
for example), so I'm curious if there's anything in the install logs that
help us understand what's up.

Neal

On Sun, Oct 31, 2021 at 7:06 PM Chris Berthiaume <ch...@uw.edu> wrote:

> Hello,
>
> After upgrading Arrow 5.0.0.2 to 6.0.0.2 in a Bioconductor 3.13 Docker
> container, I started to see some new errors when reading Parquet files that
> use snappy compression. I'm using the prebuilt Linux binary by setting
> LIBARROW_BINARY=true during installation. Building arrow using the latest
> nightly source fixes the issue. Is it possible the 6.0.0.2 prebuilt Linux
> binary does not have snappy compression support enabled? The error is
> copied below.
>
> Error: NotImplemented: Support for codec 'snappy' not built
> In order to read this file, you will need to reinstall arrow with
> additional features enabled.
> Set one of these environment variables before installing:
>
>  * LIBARROW_MINIMAL=false (for all optional features, including 'snappy')
>  * ARROW_WITH_SNAPPY=ON (for just 'snappy')
>
> See https://arrow.apache.org/docs/r/articles/install.html for details
> Backtrace:
>  1. popcycle::get.vct.by.file(db, vct_dir,
> "2018_176/2018-06-25T20-03-48+00-00") test_files.R:210:2
>  4. arrow::read_parquet(...)
>  5. base::tryCatch(reader$ReadTable(), error = read_compressed_error)
>  6. base:::tryCatchList(expr, classes, parentenv, handlers)
>  7. base:::tryCatchOne(expr, names, parentenv, handlers[[1L]])
>  8. value[[3L]](cond)
>
> Thanks,
> Chris Berthiaume
>