You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by th...@apache.org on 2021/11/03 21:03:31 UTC
[arrow-cookbook] branch main updated: Fix broken build and aesthetic improvements (#103)

This is an automated email from the ASF dual-hosted git repository.

thisisnic pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new c64c9b6  Fix broken build and aesthetic improvements (#103)
c64c9b6 is described below

commit c64c9b6c6f316b6af696ad0b50c0ae2415a627d6
Author: Nic Crane <th...@gmail.com>
AuthorDate: Wed Nov 3 21:03:25 2021 +0000

    Fix broken build and aesthetic improvements (#103)
    
    * Add arrow logo
    
    * Add missing solution headings
    
    * Shorten section titles
    
    * Delete redundant content
    
    * Add link to R docs
    
    * Rename intro sections
---
 r/content/arrays.Rmd                               |  2 +-
 r/content/index.Rmd                                |  9 ++--
 r/content/python.Rmd                               |  4 ++
 r/content/reading_and_writing_data.Rmd             | 21 +++++++--
 r/content/specify_data_types_and_schemas.Rmd       |  6 ++-
 r/content/unpublished/configure_arrow.Rmd          | 53 ----------------------
 .../unpublished/create_arrow_objects_from_r.Rmd    |  9 ----
 r/content/unpublished/cute_datasets.Rmd            | 10 ----
 r/content/unpublished/manipulate_data.Rmd          | 34 --------------
 .../work_with_arrow_in_both_python_and_r.Rmd       |  7 ---
 .../work_with_data_in_different_formats.Rmd        | 29 ------------
 11 files changed, 29 insertions(+), 155 deletions(-)

diff --git a/r/content/arrays.Rmd b/r/content/arrays.Rmd
index ae353fe..51f7d38 100644
--- a/r/content/arrays.Rmd
+++ b/r/content/arrays.Rmd
@@ -1,6 +1,6 @@
 # Manipulating Data - Arrays
 
-__What you should know before you begin__
+## Introduction
 
 An Arrow Array is roughly equivalent to an R vector - it can be used to 
 represent a single column of data, with all values having the same data type.  
diff --git a/r/content/index.Rmd b/r/content/index.Rmd
index 8e3d4bc..408717f 100644
--- a/r/content/index.Rmd
+++ b/r/content/index.Rmd
@@ -21,13 +21,12 @@ knitr::opts_template$set(test = list(
 
 # Preface
 
-```{r, echo=FALSE}
-knitr::include_graphics("images/arrow.png")
-```
+![](images/arrow.png "Apache Arrow logo")
 
 This cookbook aims to provide a number of recipes showing how to perform common 
-tasks using arrow.  This version of the cookbook works with arrow >= 6.0.0, but 
-in future we will maintain different versions of the cookbook.
+tasks using [arrow](https://arrow.apache.org/docs/r/).  This version of the 
+cookbook works with arrow >= 6.0.0, but in future we will maintain different 
+versions for the last few major R package releases.
 
 ## What is Arrow?
 
diff --git a/r/content/python.Rmd b/r/content/python.Rmd
index d6cf893..578ea0a 100644
--- a/r/content/python.Rmd
+++ b/r/content/python.Rmd
@@ -9,6 +9,8 @@ For more information on using setting up and installing PyArrow to use in R, see
 
 You want to use PyArrow to create an Arrow object in an R session.
 
+### Solution
+
 ```{r, pyarrow_object}
 library(reticulate)
 pa <- import("pyarrow")
@@ -25,6 +27,8 @@ test_that("pyarrow_object", {
 
 You want to call a PyArrow function from your R session.
 
+### Solution
+
 ```{r, pyarrow_func}
 table_1 <- Table$create(mtcars[1:5,])
 table_2 <- Table$create(mtcars[11:15,])
diff --git a/r/content/reading_and_writing_data.Rmd b/r/content/reading_and_writing_data.Rmd
index 542401a..b4c2d92 100644
--- a/r/content/reading_and_writing_data.Rmd
+++ b/r/content/reading_and_writing_data.Rmd
@@ -1,5 +1,7 @@
 # Reading and Writing Data
 
+## Introduction
+
 This chapter contains recipes related to reading and writing data using Apache 
 Arrow.  When reading files into R using Apache Arrow, you can choose to read in 
 your file as either a data frame or as an Arrow Table object.
@@ -29,8 +31,7 @@ test_that("table_create_from_df chunk works as expected", {
 ## Convert data from an Arrow Table to a data frame
 
 You want to convert an Arrow Table to a data frame to view the data or work with it
-in your usual analytics pipeline.  You can use either `as.data.frame()` or 
-`dplyr::collect()` to do this.
+in your usual analytics pipeline. 
 
 ### Solution
 
@@ -44,6 +45,10 @@ test_that("asdf_table chunk works as expected", {
 })
 ```
 
+### Discussion
+
+You can use either `as.data.frame()` or `dplyr::collect()` to do this.
+
 ## Write a Parquet file
 
 You want to write Parquet files to disk.
@@ -233,9 +238,11 @@ test_that("read_ipc_stream chunk works as expected", {
 unlink("my_table.arrows")
 ```
 
-## Read and write CSV files 
+## Read CSV files 
 
-You can use `write_csv_arrow()` to save an Arrow Table to disk as a CSV.
+You want to write Arrow data to a CSV file.
+
+### Solution
 
 ```{r, write_csv_arrow}
 write_csv_arrow(cars, "cars.csv")
@@ -246,7 +253,11 @@ test_that("write_csv_arrow chunk works as expected", {
 })
 ```
 
-You can use `read_csv_arrow()` to read in a CSV file as an Arrow Table.
+## Write CSV files 
+
+You want to read a CSV file.
+
+### Solution
 
 ```{r, read_csv_arrow}
 my_csv <- read_csv_arrow("cars.csv", as_data_frame = FALSE)
diff --git a/r/content/specify_data_types_and_schemas.Rmd b/r/content/specify_data_types_and_schemas.Rmd
index 7e0b7d4..5735a58 100644
--- a/r/content/specify_data_types_and_schemas.Rmd
+++ b/r/content/specify_data_types_and_schemas.Rmd
@@ -1,5 +1,7 @@
 # Defining Data Types
 
+## Introduction
+
 As discussed in previous chapters, Arrow automatically infers the most 
 appropriate data type when reading in data or converting R objects to Arrow 
 objects.  However, you might want to manually tell Arrow which data types to 
@@ -15,7 +17,7 @@ in [R data type to Arrow data type mappings](https://arrow.apache.org/docs/r/art
 A table containing Arrow data types, and their R equivalents can be found in 
 [Arrow data type to R data type mapping](https://arrow.apache.org/docs/r/articles/arrow.html#arrow-to-r).
 
-## Update the data type of an existing Arrow Array
+## Update data type of an existing Arrow Array
 
 You want to change the data type of an existing Arrow Array.
 
@@ -62,7 +64,7 @@ test_that("test_incompat works as expected", {
 })
 ```
 
-## Update the data type of a field in an existing Arrow Table
+## Update data type of a field in an existing Arrow Table
 
 You want to change the type of one or more fields in an existing Arrow Table.
 
diff --git a/r/content/unpublished/configure_arrow.Rmd b/r/content/unpublished/configure_arrow.Rmd
deleted file mode 100644
index 348e2d6..0000000
--- a/r/content/unpublished/configure_arrow.Rmd
+++ /dev/null
@@ -1,53 +0,0 @@
-# Configure Arrow
-
-## Get config information and check which components are available
-
-```{r, arrow_info}
-arrow_info()
-```
-
-
-## Control how many CPUs are being used
-
-```{r, cpu_count}
-cpu_count()
-```
-```{r, set_cpu_count, eval = FALSE}
-set_cpu_count(4)
-```
-
-## Control IO Thread count
-
-```{r, io_thread_count}
-io_thread_count()
-```
-
-```{r, set_io_thread_count, eval = FALSE}
-set_io_thread_count(2)
-```
-
-## Switch from the CRAN version to the development version of arrow
-
-```{r, cran_to_dev, eval = FALSE}
-install_arrow(nightly = TRUE)
-```
-
-
-## Switch from the development version to CRAN version of arrow
-
-```{r, dev_to_cran, eval = FALSE}
-install_arrow()
-```
-
-## Install compression libraries
-
-```{r}
-codec_is_available("lzo")
-```
-
-
-## Install the Arrow R package using the system Arrow installation
-
-```{r, install_system, eval = FALSE}
-install_arrow(use_system = TRUE)
-```
diff --git a/r/content/unpublished/create_arrow_objects_from_r.Rmd b/r/content/unpublished/create_arrow_objects_from_r.Rmd
deleted file mode 100644
index dacd9c3..0000000
--- a/r/content/unpublished/create_arrow_objects_from_r.Rmd
+++ /dev/null
@@ -1,9 +0,0 @@
-## Create an Arrow table from an R object
-
-## Arrays
-
-## ChunkedArrays
-
-## Scalars
-
-## RecordBatches
diff --git a/r/content/unpublished/cute_datasets.Rmd b/r/content/unpublished/cute_datasets.Rmd
deleted file mode 100644
index 2d4fc21..0000000
--- a/r/content/unpublished/cute_datasets.Rmd
+++ /dev/null
@@ -1,10 +0,0 @@
-oscars <- tibble::tibble(
-  actor = c("Katharine Hepburn", "Meryl Streep", "Jack Nicholson"),
-  num_awards = c(4, 3, 3)
-)
-
-share_data <- tibble::tibble(
-  company = c("AMZN", "GOOG", "BKNG", "TSLA"),
-  price = c(3463.12, 2884.38, 2300.46, 732.39),
-  date = rep(as.Date("2021-09-02"), 4)
-)
\ No newline at end of file
diff --git a/r/content/unpublished/manipulate_data.Rmd b/r/content/unpublished/manipulate_data.Rmd
deleted file mode 100644
index 0d1c7fb..0000000
--- a/r/content/unpublished/manipulate_data.Rmd
+++ /dev/null
@@ -1,34 +0,0 @@
-# Manipulate Data
-
-
-
-## Manipulate and analyze Arrow data with dplyr verbs 
-## Using simple mathematical and statistical function 
-## Work with character data (stringr functions and Arrow functions)
-## Work with datetime data (lubridate functions)
-
-### Extracting date components
-
-If you want to extract individual components from a date, you can use the following functions that mimic the behaviour of the equivalent `lubridate` functions:
-
-* `year`
-* `isoyear`
-* `quarter`
-* `month`
-* `day`
-* `wday`
-* `yday`
-* `isoweek`
-* `hour`
-* `minute`
-* `second`
-
-```{r, extract_week}
-
-```
-
-## Call an Arrow compute function which doesn't yet have an R binding
-## Access and manipulate Arrow objects through low-level bindings to the C++ library
-
-
-
diff --git a/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd b/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd
deleted file mode 100644
index 0366e66..0000000
--- a/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd
+++ /dev/null
@@ -1,7 +0,0 @@
-# Work with Arrow in both Python and R
-
-## Install pyarrow (released version)
-
-## Install pyarrow (development version)
-
-## Share data between R and Python  (reticulate)
diff --git a/r/content/unpublished/work_with_data_in_different_formats.Rmd b/r/content/unpublished/work_with_data_in_different_formats.Rmd
deleted file mode 100644
index 365da43..0000000
--- a/r/content/unpublished/work_with_data_in_different_formats.Rmd
+++ /dev/null
@@ -1,29 +0,0 @@
-# Work with data in different formats
-
-
-## Read and write Feather or Arrow IPC files
-
-## Read and writing streaming IPC files
-
-
-
-## Read and write Parquet files
-## Read and write CSV (and other delimited files) and JSON files
-## Read and write multi-file, larger-than-memory datasets
-## Read and write memory-mapped files
-
-```{r}
-mmap_create("mmap.arrow", 100)
-```
-```{r}
-mmap_open("mmap.arrow", mode = "write")
-```
-
-## Send and receive data over a network using an Arrow Flight RPC server
-
-```{r, include = FALSE}
-# cleanup
-unlink("mtcars.parquet")
-unlink("mtcars.feather")
-```
-