You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ro...@apache.org on 2019/06/25 09:39:29 UTC
[arrow] branch master updated: ARROW-5555: [R] Add install_arrow()
function to assist the user in obtaining C++ runtime libraries
This is an automated email from the ASF dual-hosted git repository.
romainfrancois pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new c9290cb ARROW-5555: [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries
c9290cb is described below
commit c9290cb9ff8d756bdabcff30b776b297198a7a8d
Author: Neal Richardson <ne...@gmail.com>
AuthorDate: Tue Jun 25 11:39:16 2019 +0200
ARROW-5555: [R] Add install_arrow() function to assist the user in obtaining C++ runtime libraries
Note that this function doesn't install anything: it provides context-dependent advice and for some platforms recommends commands to run. In future versions, we may want to make it do more, but this is a useful starting point.
In addition to this function, this patch also:
* Updates the README with installation instructions, including for Windows (assuming https://github.com/apache/arrow/pull/4622)
* Adds a Makefile to facilitate local package dev, building, testing, etc. (https://issues.apache.org/jira/browse/ARROW-5328)
* Fixes an UTF-8 issue in the DESCRIPTION
* A few other cleanups
Author: Neal Richardson <ne...@gmail.com>
Closes #4654 from nealrichardson/r-install-arrow and squashes the following commits:
90c2e5e8 <Neal Richardson> Simplify installation instructions
a2145629 <Neal Richardson> Add ARROW_R_DEV to Makefile
b2870489 <Neal Richardson> if (
60150149 <Neal Richardson> Update authors, including ASCII fix
3327cb1b <Neal Richardson> Implement messages for install_arrow, write tests
af34401c <Neal Richardson> Sketch of install_arrow()
4c756bb3 <Neal Richardson> Update installation docs; add Makefile for CLI-oriented folks
---
r/.Rbuildignore | 1 +
r/DESCRIPTION | 5 +-
r/Makefile | 45 +++++++++
r/NAMESPACE | 2 +
r/R/R6.R | 6 +-
r/R/RecordBatchWriter.R | 2 +-
r/R/array.R | 4 +-
r/R/arrow-package.R | 7 +-
r/R/feather.R | 6 +-
r/R/install-arrow.R | 137 +++++++++++++++++++++++++
r/README.Rmd | 90 ++++++++++-------
r/README.md | 182 ++++++++++++++++++++++------------
r/configure | 8 +-
r/man/arrow_available.Rd | 10 +-
r/man/install_arrow.Rd | 14 +++
r/tests/testthat/test-install-arrow.R | 80 +++++++++++++++
16 files changed, 486 insertions(+), 113 deletions(-)
diff --git a/r/.Rbuildignore b/r/.Rbuildignore
index b157f70..af5457a 100644
--- a/r/.Rbuildignore
+++ b/r/.Rbuildignore
@@ -14,3 +14,4 @@ clang_format.sh
^_pkgdown\.yml$
^docs$
^pkgdown$
+^Makefile$
diff --git a/r/DESCRIPTION b/r/DESCRIPTION
index 9bec314..70a6654 100644
--- a/r/DESCRIPTION
+++ b/r/DESCRIPTION
@@ -2,10 +2,11 @@ Package: arrow
Title: Integration to 'Apache' 'Arrow'
Version: 0.13.0.9000
Authors@R: c(
- person("Romain", "François", email = "romain@rstudio.com", role = c("aut", "cre")),
+ person("Romain", "Fran\u00e7ois", email = "romain@rstudio.com", role = c("aut", "cre"), comment = c(ORCID = "0000-0002-2444-4226")),
person("Javier", "Luraschi", email = "javier@rstudio.com", role = c("ctb")),
person("Jeffrey", "Wong", email = "jeffreyw@netflix.com", role = c("ctb")),
person("Jeroen", "Ooms", email = "jeroen@berkeley.edu", role = c("aut")),
+ person("Neal", "Richardson", email = "neal@ursalabs.org", role = c("aut")),
person("Apache Arrow", email = "dev@arrow.apache.org", role = c("aut", "cph"))
)
Description: 'Apache' 'Arrow' <https://arrow.apache.org/> is a cross-language
@@ -23,6 +24,7 @@ SystemRequirements: C++11
LinkingTo:
Rcpp (>= 1.0.1)
Imports:
+ utils,
Rcpp (>= 1.0.1),
rlang,
purrr,
@@ -66,6 +68,7 @@ Collate:
'csv.R'
'dictionary.R'
'feather.R'
+ 'install-arrow.R'
'json.R'
'memory_pool.R'
'message.R'
diff --git a/r/Makefile b/r/Makefile
new file mode 100644
index 0000000..f3bdbd6
--- /dev/null
+++ b/r/Makefile
@@ -0,0 +1,45 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+VERSION=$(shell grep ^Version DESCRIPTION | sed s/Version:\ //)
+ARROW_R_DEV="TRUE"
+
+doc:
+ R --slave -e 'devtools::document(); rmarkdown::render("README.Rmd")'
+ -git add --all man/*.Rd
+
+test:
+ export ARROW_R_DEV=$(ARROW_R_DEV) && R CMD INSTALL --install-tests .
+ export NOT_CRAN=true && R --slave -e 'library(testthat); setwd(file.path(.libPaths()[1], "arrow", "tests")); system.time(test_check("arrow", filter="${file}", reporter=ifelse(nchar("${r}"), "${r}", "summary")))'
+
+deps:
+ R --slave -e 'lib <- Sys.getenv("R_LIB", .libPaths()[1]); install.packages("devtools", repo="https://cloud.r-project.org", lib=lib); devtools::install_dev_deps(lib=lib)'
+
+build: doc
+ R CMD build .
+
+check: build
+ -export _R_CHECK_CRAN_INCOMING_REMOTE_=FALSE && export ARROW_R_DEV=$(ARROW_R_DEV) && R CMD check --as-cran arrow_$(VERSION).tar.gz
+ rm -rf arrow.Rcheck/
+
+clean:
+ -rm src/*.o
+ -rm src/*.so
+ -rm src/*.dll
+ -rm src/Makevars
+ -rm src/Makevars.win
+ -rm -rf arrow.Rcheck/
diff --git a/r/NAMESPACE b/r/NAMESPACE
index 281d27b..66f2004 100644
--- a/r/NAMESPACE
+++ b/r/NAMESPACE
@@ -133,6 +133,7 @@ export(field)
export(float16)
export(float32)
export(float64)
+export(install_arrow)
export(int16)
export(int32)
export(int64)
@@ -184,4 +185,5 @@ importFrom(rlang,dots_n)
importFrom(rlang,is_false)
importFrom(rlang,list2)
importFrom(rlang,warn)
+importFrom(utils,packageVersion)
useDynLib(arrow, .registration = TRUE)
diff --git a/r/R/R6.R b/r/R/R6.R
index 41169f3..06dd6f0 100644
--- a/r/R/R6.R
+++ b/r/R/R6.R
@@ -27,7 +27,7 @@
},
print = function(...){
cat(class(self)[[1]], "\n")
- if(!is.null(self$ToString)){
+ if (!is.null(self$ToString)){
cat(self$ToString(), "\n")
}
invisible(self)
@@ -36,11 +36,11 @@
)
shared_ptr <- function(class, xp) {
- if(!shared_ptr_is_null(xp)) class$new(xp)
+ if (!shared_ptr_is_null(xp)) class$new(xp)
}
unique_ptr <- function(class, xp) {
- if(!unique_ptr_is_null(xp)) class$new(xp)
+ if (!unique_ptr_is_null(xp)) class$new(xp)
}
#' @export
diff --git a/r/R/RecordBatchWriter.R b/r/R/RecordBatchWriter.R
index 7730511..59aa984 100644
--- a/r/R/RecordBatchWriter.R
+++ b/r/R/RecordBatchWriter.R
@@ -44,7 +44,7 @@
write = function(x) {
if (inherits(x, "arrow::RecordBatch")) {
self$write_batch(x)
- } else if(inherits(x, "arrow::Table")) {
+ } else if (inherits(x, "arrow::Table")) {
self$write_table(x)
} else if (inherits(x, "data.frame")) {
self$write_table(table(x))
diff --git a/r/R/array.R b/r/R/array.R
index 7e5e955..deb3bc5 100644
--- a/r/R/array.R
+++ b/r/R/array.R
@@ -132,11 +132,11 @@
`arrow::Array`$dispatch <- function(xp){
a <- shared_ptr(`arrow::Array`, xp)
- if(a$type_id() == Type$DICTIONARY){
+ if (a$type_id() == Type$DICTIONARY){
a <- shared_ptr(`arrow::DictionaryArray`, xp)
} else if (a$type_id() == Type$STRUCT) {
a <- shared_ptr(`arrow::StructArray`, xp)
- } else if(a$type_id() == Type$LIST) {
+ } else if (a$type_id() == Type$LIST) {
a <- shared_ptr(`arrow::ListArray`, xp)
}
a
diff --git a/r/R/arrow-package.R b/r/R/arrow-package.R
index faaaf2a..77b3a31 100644
--- a/r/R/arrow-package.R
+++ b/r/R/arrow-package.R
@@ -24,8 +24,13 @@
#' @keywords internal
"_PACKAGE"
-#' Is the C++ Arrow library available
+#' Is the C++ Arrow library available?
#'
+#' You won't generally need to call this function, but it's here in case it
+#' helps for development purposes.
+#' @return `TRUE` or `FALSE` depending on whether the package was installed
+#' with the Arrow C++ library. If `FALSE`, you'll need to install the C++
+#' library and then reinstall the R package. See [install_arrow()] for help.
#' @export
arrow_available <- function() {
.Call(`_arrow_available`)
diff --git a/r/R/feather.R b/r/R/feather.R
index 998f39b..f197fd0 100644
--- a/r/R/feather.R
+++ b/r/R/feather.R
@@ -141,7 +141,11 @@ FeatherTableReader.character <- function(file, mmap = TRUE, ...) {
#' @export
FeatherTableReader.fs_path <- function(file, mmap = TRUE, ...) {
- stream <- if(isTRUE(mmap)) mmap_open(file, ...) else ReadableFile(file, ...)
+ if (isTRUE(mmap)) {
+ stream <- mmap_open(file, ...)
+ } else {
+ stream <- ReadableFile(file, ...)
+ }
FeatherTableReader(stream)
}
diff --git a/r/R/install-arrow.R b/r/R/install-arrow.R
new file mode 100644
index 0000000..5b30277
--- /dev/null
+++ b/r/R/install-arrow.R
@@ -0,0 +1,137 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+#' Help installing the Arrow C++ library
+#'
+#' Binary package installations should come with a working Arrow C++ library,
+#' but when installing from source, you'll need to obtain the C++ library
+#' first. This function offers guidance on how to get the C++ library depending
+#' on your operating system and package version.
+#' @export
+#' @importFrom utils packageVersion
+install_arrow <- function() {
+ os <- tolower(Sys.info()[["sysname"]])
+ # c("windows", "darwin", "linux", "sunos") # win/mac/linux/solaris
+ version <- packageVersion("arrow")
+ # From CRAN check:
+ rep <- installed.packages(fields="Repository")["arrow", "Repository"]
+ from_cran <- identical(rep, "CRAN")
+ # Is it possible to tell if was a binary install from CRAN vs. source?
+
+ message(install_arrow_msg(arrow_available(), version, from_cran, os))
+}
+
+install_arrow_msg <- function(has_arrow, version, from_cran, os) {
+ # TODO: check if there is a newer version on CRAN?
+
+ # install_arrow() sends "version" as a "package_version" class, but for
+ # convenience, this also accepts a string like "0.13.0". Calling
+ # `package_version` is idempotent so do it again, and then `unclass` to get
+ # the integers. Then see how many there are.
+ dev_version <- length(unclass(package_version(version))[[1]]) > 3
+ # Based on these parameters, assemble a string with installation advice
+ if (has_arrow) {
+ # Respond that you already have it
+ msg <- ALREADY_HAVE
+ } else if (os == "sunos") {
+ # Good luck with that.
+ msg <- c(SEE_DEV_GUIDE, THEN_REINSTALL)
+ } else if (os == "linux") {
+ if (dev_version) {
+ # Point to compilation instructions on readme
+ msg <- c(SEE_DEV_GUIDE, THEN_REINSTALL)
+ } else {
+ # Suggest arrow.apache.org/install for PPAs, or compilation instructions
+ msg <- c(paste(SEE_ARROW_INSTALL, OR_SEE_DEV_GUIDE), THEN_REINSTALL)
+ }
+ } else if (!dev_version && !from_cran) {
+ # Windows or Mac with a released version but not from CRAN
+ # Recommend installing released binary package from CRAN
+ msg <- INSTALL_FROM_CRAN
+ } else {
+ # Windows or Mac, most likely a dev version
+ # for each OS, recommend dev installation, refer to readme
+ # TODO: if there is a newer version on CRAN, recommend CRAN
+ if (os == "windows") {
+ msg <- c(paste(FIND_WIN_BINARY, OR_SEE_DEV_GUIDE), THEN_REINSTALL)
+ } else {
+ # macOS
+ msg <- c(paste(FIND_MAC_BINARY, OR_SEE_DEV_GUIDE), THEN_REINSTALL)
+ }
+ }
+ # Common postscript
+ msg <- c(msg, SEE_README, REPORT_ISSUE)
+ paste(msg, collapse="\n\n")
+}
+
+ALREADY_HAVE <- paste(
+ "It appears you already have Arrow installed successfully:",
+ "are you trying to install a different version of the library?"
+)
+
+SEE_DEV_GUIDE <- paste(
+ "See the Arrow C++ developer guide",
+ "<https://arrow.apache.org/docs/developers/cpp.html>",
+ "for instructions on building the library from source."
+)
+# Variation of that
+OR_SEE_DEV_GUIDE <- paste0(
+ "Or, s",
+ substr(SEE_DEV_GUIDE, 2, nchar(SEE_DEV_GUIDE))
+)
+
+SEE_ARROW_INSTALL <- paste(
+ "See the Apache Arrow project installation page",
+ "<https://arrow.apache.org/install/>",
+ "for how to install the C++ package from a PPA."
+)
+
+THEN_REINSTALL <- paste(
+ "After you've installed the C++ library,",
+ "you'll need to reinstall the R package from source to find it."
+)
+
+SEE_README <- paste(
+ "Refer to the R package README",
+ "<https://github.com/apache/arrow/blob/master/r/README.md>",
+ "for further details."
+)
+
+REPORT_ISSUE <- paste(
+ "If you have other trouble, or if you think this message could be improved,",
+ "please report an issue here:",
+ "<https://issues.apache.org/jira/projects/ARROW/issues>"
+)
+
+INSTALL_FROM_CRAN <- paste(
+ 'Try installing the package from CRAN:',
+ '`install.packages("arrow")`'
+)
+
+FIND_WIN_BINARY <- paste(
+ "You may be able to download a development version of the C++ binary",
+ "from the Apache Arrow project's Appveyor:",
+ "<https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow>.",
+ "Select an R job from a recent build,",
+ 'and download the `build\arrow-*.zip` file from the "Artifacts" tab.',
+ "Then, set the RWINLIB_LOCAL environment variable to point to that file."
+)
+
+FIND_MAC_BINARY <- paste(
+ "You may be able to get a development version of the Arrow C++ library",
+ "using Homebrew: `brew install apache-arrow --HEAD`"
+)
diff --git a/r/README.Rmd b/r/README.Rmd
index f718732..6b83817 100644
--- a/r/README.Rmd
+++ b/r/README.Rmd
@@ -16,42 +16,75 @@ knitr::opts_chunk$set(
```
# arrow
-[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
+[![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow) [![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
[Apache Arrow](https://arrow.apache.org/) is a cross-language development platform for in-memory data. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. It also provides computational libraries and zero-copy streaming messaging and interprocess communication.
-The `arrow` package exposes an interface to the Arrow C++ library.
+The `arrow` package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for working with Parquet (`read_parquet()`, `write_parquet()`) and Feather (`read_feather()`, `write_feather()`) files, as well as lower-level access to Arrow memory and messages.
## Installation
-Installing `arrow` is a two-step process: first install the Arrow C++ library, then the R package.
+Install the latest release of `arrow` from CRAN with
-### Release version
+```r
+install.packages("arrow")
+```
+
+On macOS and Windows, installing a binary package from CRAN will handle Arrow's C++ dependencies for you. On Linux, you'll need to first install the C++ library. See the [Arrow project installation page](https://arrow.apache.org/install/) for a list of PPAs from which you can obtain it.
-The current release of Apache Arrow is 0.13.
+If you install the `arrow` package from source and the C++ library is not found, the R package functions will notify you that Arrow is not available. Call
+
+```r
+arrow::install_arrow()
+```
-On macOS, you may install the C++ library using [Homebrew](https://brew.sh/):
+for version- and platform-specific guidance on installing the Arrow C++ library.
+
+## Example
+
+```{r}
+library(arrow)
+set.seed(24)
+
+tab <- arrow::table(x = 1:10, y = rnorm(10))
+tab$schema
+tab
+as.data.frame(tab)
+```
+
+## Installing a development version
+
+To use the development version of the R package, you'll need to install it from source, which requires the additional C++ library setup. On macOS, you may install the C++ library using [Homebrew](https://brew.sh/):
```shell
+# For the released version:
brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
```
-On Linux, see the [Arrow project installation page](https://arrow.apache.org/install/).
+On Windows, you can download a .zip file with the arrow dependencies from the [rwinlib](https://github.com/rwinlib/arrow/releases) project, and then set the `RWINLIB_LOCAL` environment variable to point to that zip file before installing the `arrow` R package. That project contains released versions of the C++ library; for a development version, Windows users may be able to find a binary by going to the [Apache Arrow project's Appveyor](https://ci.appveyor.com/project/ApacheSoftwareFound [...]
-Then, you can install the release version of the package from GitHub using the [`remotes`](https://remotes.r-lib.org/) package. From within an R session,
+Linux users can get a released version of the library from our PPAs, as described above. If you need a development version of the C++ library, you will likely need to build it from source. See "Development" below.
+
+Once you have the C++ library, you can install the R package from GitHub using the [`remotes`](https://remotes.r-lib.org/) package. From within an R session,
```r
# install.packages("remotes") # Or install "devtools", which includes remotes
-remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7")
+remotes::install_github("apache/arrow/r")
```
or if you prefer to stay at the command line,
```shell
-R -e 'remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7")'
+R -e 'remotes::install_github("apache/arrow/r")'
```
-### Development version
+You can specify a particular commit, branch, or [release](https://github.com/apache/arrow/releases) to install by including a `ref` argument to `install_github()`.
+
+## Developing
+
+If you need to alter both the Arrow C++ library and the R package code, or if you can't get a binary version of the latest C++ library elsewhere, you'll need to build it from source too.
First, clone the repository and install a release build of the C++ library.
@@ -62,11 +95,13 @@ cmake .. -DARROW_PARQUET=ON -DARROW_BOOST_USE_SHARED:BOOL=Off -DARROW_INSTALL_NA
make install
```
-Then, you can install the R package and its dependencies from the git checkout:
+This likely will require additional system libraries to be installed, the specifics of which are platform dependent. See the [C++ developer guide](https://arrow.apache.org/docs/developers/cpp.html) for details.
+
+Once you've built the C++ library, you can install the R package and its dependencies, along with additional dev dependencies, from the git checkout:
```shell
cd ../../r
-R -e 'install.packages("remotes"); remotes::install_deps()'
+R -e 'install.packages("devtools"); devtools::install_dev_deps()'
R CMD INSTALL .
```
@@ -83,40 +118,27 @@ try setting the environment variable `LD_LIBRARY_PATH` (or `DYLD_LIBRARY_PATH` o
For any other build/configuration challenges, see the [C++ developer guide](https://arrow.apache.org/docs/developers/cpp.html#building).
-## Example
-
-```{r}
-library(arrow)
-
-tab <- arrow::table(x = 1:10, y = rnorm(10))
-tab$schema
-tab
-as.data.frame(tab)
-```
-
-## Developing
-
-It is recommended to use the `devtools` package to help with some package-related manipulations. You'll need a few more R packages, which you can install using
+### Editing Rcpp code
-```r
-install.packages("devtools")
-devtools::install_dev_deps()
-```
+The `arrow` package uses some customized tools on top of `Rcpp` to prepare its C++ code in `src/`. If you change C++ code in the R package, you will need to set the `ARROW_R_DEV` environment variable to `TRUE` (optionally, add it to your`~/.Renviron` file to persist across sessions) so that the `data-raw/codegen.R` file is used for code generation.
-If you change C++ code, you need to set the `ARROW_R_DEV` environment variable to `TRUE`, e.g.
-in your `~/.Renviron` so that the `data-raw/codegen.R` file is used for code generation.
+You'll also need `remotes::install_github("romainfrancois/decor")`.
### Useful functions
+Within an R session, these can help with package development:
+
```r
devtools::load_all() # Load the dev package
devtools::test(filter="^regexp$") # Run the test suite, optionally filtering file names
devtools::document() # Update roxygen documentation
rmarkdown::render("README.Rmd") # To rebuild README.md
-pkgdown::build_site() # To preview the documentation website
+pkgdown::build_site(run_dont_run=TRUE) # To preview the documentation website
devtools::check() # All package checks; see also below
```
+Any of those can be run from the command line by wrapping them in `R -e '$COMMAND'`. There's also a `Makefile` to help with some common tasks from the command line (`make test`, `make doc`, `make clean`, etc.)
+
### Full package validation
```shell
diff --git a/r/README.md b/r/README.md
index b584486..6fc2e33 100644
--- a/r/README.md
+++ b/r/README.md
@@ -3,6 +3,7 @@
# arrow
+[![cran](https://www.r-pkg.org/badges/version-last-release/arrow)](https://cran.r-project.org/package=arrow)
[![conda-forge](https://img.shields.io/conda/vn/conda-forge/r-arrow.svg)](https://anaconda.org/conda-forge/r-arrow)
[Apache Arrow](https://arrow.apache.org/) is a cross-language
@@ -12,43 +13,125 @@ data, organized for efficient analytic operations on modern hardware. It
also provides computational libraries and zero-copy streaming messaging
and interprocess communication.
-The `arrow` package exposes an interface to the Arrow C++ library.
+The `arrow` package exposes an interface to the Arrow C++ library to
+access many of its features in R. This includes support for working with
+Parquet (`read_parquet()`, `write_parquet()`) and Feather
+(`read_feather()`, `write_feather()`) files, as well as lower-level
+access to Arrow memory and messages.
## Installation
-Installing `arrow` is a two-step process: first install the Arrow C++
-library, then the R package.
+Install the latest release of `arrow` from CRAN with
-### Release version
+``` r
+install.packages("arrow")
+```
+
+On macOS and Windows, installing a binary package from CRAN will handle
+Arrow’s C++ dependencies for you. On Linux, you’ll need to first install
+the C++ library. See the [Arrow project installation
+page](https://arrow.apache.org/install/) for a list of PPAs from which
+you can obtain it.
-The current release of Apache Arrow is 0.13.
+If you install the `arrow` package from source and the C++ library is
+not found, the R package functions will notify you that Arrow is not
+available. Call
-On macOS, you may install the C++ library using
+``` r
+arrow::install_arrow()
+```
+
+for version- and platform-specific guidance on installing the Arrow C++
+library.
+
+## Example
+
+``` r
+library(arrow)
+#>
+#> Attaching package: 'arrow'
+#> The following object is masked from 'package:utils':
+#>
+#> timestamp
+#> The following objects are masked from 'package:base':
+#>
+#> array, table
+set.seed(24)
+
+tab <- arrow::table(x = 1:10, y = rnorm(10))
+tab$schema
+#> arrow::Schema
+#> x: int32
+#> y: double
+tab
+#> arrow::Table
+as.data.frame(tab)
+#> x y
+#> 1 1 -0.545880758
+#> 2 2 0.536585304
+#> 3 3 0.419623149
+#> 4 4 -0.583627199
+#> 5 5 0.847460017
+#> 6 6 0.266021979
+#> 7 7 0.444585270
+#> 8 8 -0.466495124
+#> 9 9 -0.848370044
+#> 10 10 0.002311942
+```
+
+## Installing a development version
+
+To use the development version of the R package, you’ll need to install
+it from source, which requires the additional C++ library setup. On
+macOS, you may install the C++ library using
[Homebrew](https://brew.sh/):
``` shell
+# For the released version:
brew install apache-arrow
+# Or for a development version, you can try:
+brew install apache-arrow --HEAD
```
-On Linux, see the [Arrow project installation
-page](https://arrow.apache.org/install/).
-
-Then, you can install the release version of the package from GitHub
+On Windows, you can download a .zip file with the arrow dependencies
+from the [rwinlib](https://github.com/rwinlib/arrow/releases) project,
+and then set the `RWINLIB_LOCAL` environment variable to point to that
+zip file before installing the `arrow` R package. That project contains
+released versions of the C++ library; for a development version, Windows
+users may be able to find a binary by going to the [Apache Arrow
+project’s
+Appveyor](https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow),
+selecting an R job from a recent build, and downloading the
+`build\arrow-*.zip` file from the “Artifacts” tab.
+
+Linux users can get a released version of the library from our PPAs, as
+described above. If you need a development version of the C++ library,
+you will likely need to build it from source. See “Development” below.
+
+Once you have the C++ library, you can install the R package from GitHub
using the [`remotes`](https://remotes.r-lib.org/) package. From within
an R session,
``` r
# install.packages("remotes") # Or install "devtools", which includes remotes
-remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7")
+remotes::install_github("apache/arrow/r")
```
or if you prefer to stay at the command line,
``` shell
-R -e 'remotes::install_github("apache/arrow/r", ref="76e1bc5dfb9d08e31eddd5cbcc0b1bab934da2c7")'
+R -e 'remotes::install_github("apache/arrow/r")'
```
-### Development version
+You can specify a particular commit, branch, or
+[release](https://github.com/apache/arrow/releases) to install by
+including a `ref` argument to `install_github()`.
+
+## Developing
+
+If you need to alter both the Arrow C++ library and the R package code,
+or if you can’t get a binary version of the latest C++ library
+elsewhere, you’ll need to build it from source too.
First, clone the repository and install a release build of the C++
library.
@@ -60,12 +143,17 @@ cmake .. -DARROW_PARQUET=ON -DARROW_BOOST_USE_SHARED:BOOL=Off -DARROW_INSTALL_NA
make install
```
-Then, you can install the R package and its dependencies from the git
+This likely will require additional system libraries to be installed,
+the specifics of which are platform dependent. See the [C++ developer
+guide](https://arrow.apache.org/docs/developers/cpp.html) for details.
+
+Once you’ve built the C++ library, you can install the R package and its
+dependencies, along with additional dev dependencies, from the git
checkout:
``` shell
cd ../../r
-R -e 'install.packages("remotes"); remotes::install_deps()'
+R -e 'install.packages("devtools"); devtools::install_dev_deps()'
R CMD INSTALL .
```
@@ -84,68 +172,34 @@ installing the R package.
For any other build/configuration challenges, see the [C++ developer
guide](https://arrow.apache.org/docs/developers/cpp.html#building).
-## Example
+### Editing Rcpp code
-``` r
-library(arrow)
-#>
-#> Attaching package: 'arrow'
-#> The following object is masked from 'package:utils':
-#>
-#> timestamp
-#> The following objects are masked from 'package:base':
-#>
-#> array, table
+The `arrow` package uses some customized tools on top of `Rcpp` to
+prepare its C++ code in `src/`. If you change C++ code in the R package,
+you will need to set the `ARROW_R_DEV` environment variable to `TRUE`
+(optionally, add it to your`~/.Renviron` file to persist across
+sessions) so that the `data-raw/codegen.R` file is used for code
+generation.
-tab <- arrow::table(x = 1:10, y = rnorm(10))
-tab$schema
-#> arrow::Schema
-#> x: int32
-#> y: double
-tab
-#> arrow::Table
-as.data.frame(tab)
-#> # A tibble: 10 x 2
-#> x y
-#> <int> <dbl>
-#> 1 1 -1.56
-#> 2 2 -0.147
-#> 3 3 -1.16
-#> 4 4 0.106
-#> 5 5 1.14
-#> 6 6 0.340
-#> 7 7 0.184
-#> 8 8 -1.01
-#> 9 9 1.77
-#> 10 10 0.344
-```
-
-## Developing
-
-It is recommended to use the `devtools` package to help with some
-package-related manipulations. You’ll need a few more R packages, which
-you can install using
-
-``` r
-install.packages("devtools")
-devtools::install_dev_deps()
-```
-
-If you change C++ code, you need to set the `ARROW_R_DEV` environment
-variable to `TRUE`, e.g. in your `~/.Renviron` so that the
-`data-raw/codegen.R` file is used for code generation.
+You’ll also need `remotes::install_github("romainfrancois/decor")`.
### Useful functions
+Within an R session, these can help with package development:
+
``` r
devtools::load_all() # Load the dev package
devtools::test(filter="^regexp$") # Run the test suite, optionally filtering file names
devtools::document() # Update roxygen documentation
rmarkdown::render("README.Rmd") # To rebuild README.md
-pkgdown::build_site() # To preview the documentation website
+pkgdown::build_site(run_dont_run=TRUE) # To preview the documentation website
devtools::check() # All package checks; see also below
```
+Any of those can be run from the command line by wrapping them in `R -e
+'$COMMAND'`. There’s also a `Makefile` to help with some common tasks
+from the command line (`make test`, `make doc`, `make clean`, etc.)
+
### Full package validation
``` shell
diff --git a/r/configure b/r/configure
index 4b3484f..d5a2688 100755
--- a/r/configure
+++ b/r/configure
@@ -69,8 +69,8 @@ else
BREWDIR=$(brew --prefix)
if brew ls --versions apache-arrow > /dev/null; then
- # right now, we need HEAD version
- brew install apache-arrow --HEAD
+ # Install apache-arrow via Homebrew if we don't already have it
+ brew install apache-arrow
fi
PKG_CFLAGS="-I$BREWDIR/opt/$PKG_BREW_NAME/include"
@@ -104,8 +104,8 @@ echo "#include $PKG_TEST_HEADER" | ${CXXCPP} ${CPPFLAGS} ${PKG_CFLAGS} ${CXX11FL
# Customize the error
if [ $? -ne 0 ]; then
echo "------------------------- NOTE ---------------------------"
- echo "After installation, please run arrow::install_arrow() to install"
- echo "required runtime libraries"
+ echo "After installation, please run arrow::install_arrow()"
+ echo "for help installing required runtime libraries"
echo "---------------------------------------------------------"
PKG_LIBS=""
PKG_CFLAGS=""
diff --git a/r/man/arrow_available.Rd b/r/man/arrow_available.Rd
index 26a01ca..ea2c54a 100644
--- a/r/man/arrow_available.Rd
+++ b/r/man/arrow_available.Rd
@@ -2,10 +2,16 @@
% Please edit documentation in R/arrow-package.R
\name{arrow_available}
\alias{arrow_available}
-\title{Is the C++ Arrow library available}
+\title{Is the C++ Arrow library available?}
\usage{
arrow_available()
}
+\value{
+\code{TRUE} or \code{FALSE} depending on whether the package was installed
+with the Arrow C++ library. If \code{FALSE}, you'll need to install the C++
+library and then reinstall the R package. See \code{\link[=install_arrow]{install_arrow()}} for help.
+}
\description{
-Is the C++ Arrow library available
+You won't generally need to call this function, but it's here in case it
+helps for development purposes.
}
diff --git a/r/man/install_arrow.Rd b/r/man/install_arrow.Rd
new file mode 100644
index 0000000..4393258
--- /dev/null
+++ b/r/man/install_arrow.Rd
@@ -0,0 +1,14 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/install-arrow.R
+\name{install_arrow}
+\alias{install_arrow}
+\title{Help installing the Arrow C++ library}
+\usage{
+install_arrow()
+}
+\description{
+Binary package installations should come with a working Arrow C++ library,
+but when installing from source, you'll need to obtain the C++ library
+first. This function offers guidance on how to get the C++ library depending
+on your operating system and package version.
+}
diff --git a/r/tests/testthat/test-install-arrow.R b/r/tests/testthat/test-install-arrow.R
new file mode 100644
index 0000000..318e8e0
--- /dev/null
+++ b/r/tests/testthat/test-install-arrow.R
@@ -0,0 +1,80 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+context("install_arrow()")
+
+test_that("install_arrow() prints a message", {
+ expect_message(install_arrow())
+})
+
+i_have_arrow_msg <- "It appears you already have Arrow installed successfully: are you trying to install a different version of the library?
+
+Refer to the R package README <https://github.com/apache/arrow/blob/master/r/README.md> for further details.
+
+If you have other trouble, or if you think this message could be improved, please report an issue here: <https://issues.apache.org/jira/projects/ARROW/issues>"
+
+test_that("Messages get the standard postscript appended", {
+ expect_identical(
+ install_arrow_msg(has_arrow = TRUE, "0.13.0"),
+ i_have_arrow_msg
+ )
+})
+
+test_that("Solaris and Linux dev version get pointed to C++ guide", {
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0", os="sunos"),
+ "See the Arrow C++ developer guide",
+ fixed = TRUE
+ )
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0.9000", os="linux"),
+ "See the Arrow C++ developer guide",
+ fixed = TRUE
+ )
+})
+
+test_that("Linux on release version gets pointed to PPA first, then C++", {
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0", os="linux"),
+ "PPA. Or, see the Arrow C++ developer guide",
+ fixed = TRUE
+ )
+})
+
+test_that("Win/mac release version get pointed to CRAN", {
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0", os="darwin", from_cran=FALSE),
+ "install.packages",
+ fixed = TRUE
+ )
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0", os="windows", from_cran=FALSE),
+ "install.packages",
+ fixed = TRUE
+ )
+})
+
+test_that("Win/mac dev version get recommendations", {
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0.9000", os="darwin", from_cran=FALSE),
+ "Homebrew"
+ )
+ expect_match(
+ install_arrow_msg(FALSE, "0.13.0.9000", os="windows", from_cran=FALSE),
+ "RWINLIB_LOCAL"
+ )
+})