You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2021/07/28 14:55:28 UTC

[arrow-cookbook] branch main updated: Initial content for Arrow Cookbook for Python and R (#1)

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new d93c637  Initial content for Arrow Cookbook for Python and R (#1)
d93c637 is described below

commit d93c637895ca40d6ec5371c6399757dac7a6f6ea
Author: Alessandro Molina <am...@turbogears.org>
AuthorDate: Wed Jul 28 16:38:20 2021 +0200

    Initial content for Arrow Cookbook for Python and R (#1)
    
    * Initial Import
    
    * R cookbook initial commit (#1)
    
    * R Cookbook skeleton and initial chapter
    
    * Move r test script to a separate directory
    
    * Add Apache 2 license
    
    * Add parquet section
    
    * Delete files used to demonstrate failing tests in CI
    
    * Licensing
    
    * Add content for different formats and rearrange headings
    
    * Small change to make the tests run on macOS
    
    * Completed the IO section and added intersphinx with PyArrow
    
    * Add workflow to deploy to GH pages
    
    * Update path
    
    * Rename chapters and fill in section titles
    
    * Commit whitespace to trigger build
    
    * Update bookdown job
    
    * try new job config
    
    * Install nightly Arrow
    
    * Evaluate all relevant bits!
    
    * Deploy to r dir
    
    * Try new workflow
    
    * update build path
    
    * Add email and update paths
    
    * Update job to build all cookbooks
    
    * Delete whitespace to trigger build
    
    * Swap order to see if this fixes build
    
    * Install system dependencies
    
    * Put it back on Mac so it's faster
    
    * Separate steps to diagnose issue
    
    * Brew not sudo
    
    * Switching to ubuntu as I don't understand why python 2
    
    * Don't put results in r directory
    
    * Capitalise 'C'
    
    * Update bookdown link so can click to fork/edit
    
    * Add CI stage that runs tests
    
    * Add examples of manually creating Arrow objects and writing to various formats
    
    * Add S3 parquet
    
    * Partitioned data
    
    * Partitioned Data from S3
    
    * Rename record_batch_create chunk
    
    * CSV recipe requires pandas
    
    * Filter parquet data on read
    
    * Reading/Writing feather files
    
    * remove duplicated chunk name
    
    * tweak create
    
    * Categorical data
    
    * Speed up compiling
    
    * Fix tests
    
    * tests pass
    
    * Data manipulation functions
    
    * Link to compute functions
    
    * Tweak naming
    
    * Add contribution file
    
    * landing page style tweak
    
    * Improve contribution documentation
    
    * Explicitly reference the contribution docs
    
    * ignore build directory
    
    * Change branch name
    
    * Update contents
    
    * Update CONTRIBUTING.md
    
    * Suggestions from Grammarly
    
    * Rename initial chapter
    
    * Update Makefile to allow Arrow version to be specified
    
    * Truncate license file to relevant part
    
    * typo
    
    * Apply suggestions from code review
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Add link to code of conduct
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Capitalise "Array"
    
    * Update r/CONTRIBUTING.md
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update r/content/reading_and_writing_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update r/content/creating_arrow_objects.Rmd
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Apply suggestions from code review
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Mention dependencies
    
    * Mention that this is not the documentation
    
    * rewording
    
    * Add -jauto by default and indent a print
    
    * The Apache Software Foundation
    
    * reword
    
    * Correct ambiguous and incorrect phrasing
    
    * Update r/content/reading_and_writing_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Update r/content/reading_and_writing_data.Rmd
    
    Co-authored-by: Weston Pace <we...@gmail.com>
    
    * Reorder sections
    
    * Update r/content/manipulating_data.Rmd
    
    Co-authored-by: Ian Cook <ia...@gmail.com>
    
    * Remove redundant code snippet
    
    * Update reading CSVs
    
    * Add in section on converting from/to Arrow Tables and tibbles
    
    * rephrase list of numbers
    
    * rephrase list of numbers
    
    * Add missing bracket
    
    * Rephrase about parquet containing multiple cols
    
    * rephrased
    
    * Adapt to Arrow 5.0 output
    
    Co-authored-by: Nic <th...@gmail.com>
    Co-authored-by: Jonathan Keane <jk...@gmail.com>
    Co-authored-by: Weston Pace <we...@gmail.com>
    Co-authored-by: Ian Cook <ia...@gmail.com>
---
 .github/.gitignore                                 |   1 +
 .github/workflows/deploy_cookbooks.yml             |  48 +++
 .gitignore                                         |   5 +
 CONTRIBUTING.md                                    |  23 ++
 LICENSE                                            | 202 ++++++++++
 Makefile                                           |  55 +++
 README.rst                                         |  60 +++
 build/arrow.png                                    | Bin 0 -> 21636 bytes
 build/index.html                                   |  49 +++
 python/CONTRIBUTING.rst                            |  75 ++++
 python/Makefile                                    |  20 +
 python/make.bat                                    |  35 ++
 python/requirements.txt                            |   3 +
 python/source/conf.py                              |  57 +++
 python/source/create.rst                           | 111 ++++++
 python/source/data.rst                             | 139 +++++++
 python/source/index.rst                            |  27 ++
 python/source/io.rst                               | 425 +++++++++++++++++++++
 r/.Rbuildignore                                    |   1 +
 r/CONTRIBUTING.md                                  |  53 +++
 r/content/_bookdown.yml                            |  11 +
 r/content/creating_arrow_objects.Rmd               |  85 +++++
 r/content/index.Rmd                                |  21 +
 r/content/manipulating_data.Rmd                    |  75 ++++
 r/content/reading_and_writing_data.Rmd             | 288 ++++++++++++++
 r/content/unpublished/configure_arrow.Rmd          |  53 +++
 .../unpublished/create_arrow_objects_from_r.Rmd    |   9 +
 r/content/unpublished/manipulate_data.Rmd          |  34 ++
 .../unpublished/specify_data_types_and_schemas.Rmd |  10 +
 .../work_with_arrow_in_both_python_and_r.Rmd       |   7 +
 .../work_with_compressed_or_partitioned_data.Rmd   |   5 +
 .../work_with_data_in_different_formats.Rmd        |  29 ++
 r/scripts/install_dependencies.R                   |  34 ++
 r/scripts/test.R                                   |  59 +++
 34 files changed, 2109 insertions(+)

diff --git a/.github/.gitignore b/.github/.gitignore
new file mode 100644
index 0000000..2d19fc7
--- /dev/null
+++ b/.github/.gitignore
@@ -0,0 +1 @@
+*.html
diff --git a/.github/workflows/deploy_cookbooks.yml b/.github/workflows/deploy_cookbooks.yml
new file mode 100644
index 0000000..55f5277
--- /dev/null
+++ b/.github/workflows/deploy_cookbooks.yml
@@ -0,0 +1,48 @@
+on:
+  push:
+     branches:
+       - main
+
+name: render_cookbooks
+
+jobs:
+  make_books:
+    name: Render-Book
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v1
+      - uses: r-lib/actions/setup-r@v1
+      - uses: r-lib/actions/setup-pandoc@v1
+      - name: Install dependencies
+        run: sudo apt install libcurl4-openssl-dev libssl-dev
+      - name: Run tests
+        run: make test
+      - name: Build and render books
+        run: make all
+      - uses: actions/upload-artifact@v1
+        with:
+          name: build_book
+          path: build/
+
+  checkout-and-deploy:
+   runs-on: ubuntu-latest
+   needs: make_books
+   steps:
+     - name: Checkout
+       uses: actions/checkout@master
+     - name: Download artifact
+       uses: actions/download-artifact@v1.0.0
+       with:
+         # Artifact name
+         name: build_book # optional
+         # Destination path
+         path: . # optional
+     - name: Deploy to GitHub Pages
+       uses: Cecilapp/GitHub-Pages-deploy@v3
+       env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+       with:
+          email: ${{ secrets.EMAIL }}
+          build_dir: .                   # optional
+          jekyll: no                     # optional
+
diff --git a/.gitignore b/.gitignore
index e69de29..5c0998b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,5 @@
+r/content/_book/**
+r/*.Rproj
+*.Rproj
+.Rproj.user
+python/build
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..e87e6ec
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,23 @@
+# How to contribute to Apache Arrow Cookbook
+
+## Did you find a bug?
+
+If you find a bug in the cookbook, please let us know by opening an issue on GitHub.
+
+## Do you want to contribute new recipes or improvements?
+
+We always welcome contributions of new recipes for the cookbook.  
+To make a contributions, please fork this repo and submit a pull request with your contribution.
+
+Any changes which add new code chunks or recipes must be tested when the `make test` command
+is run, please refer to the language specific cookbook contribution documentation for information on
+how to make your recipes testable.
+
+ * [Contributing to Python Cookbook](python/CONTRIBUTING.rst)
+ * [Contributing to R Cookbook](r/CONTRIBUTING.md)
+ 
+ ------------------------------------------------------------------------
+
+All participation in the Apache Arrow project is governed by the Apache
+Software Foundation’s [code of
+conduct](https://www.apache.org/foundation/policies/conduct.html).
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..d645695
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,202 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/Makefile b/Makefile
new file mode 100644
index 0000000..ad2b473
--- /dev/null
+++ b/Makefile
@@ -0,0 +1,55 @@
+all: html
+
+
+html: py r
+	@echo "\n\n>>> Cookbooks Available in ./build <<<"
+
+
+test:   pytest rtest
+
+
+help:
+	@echo "make all         Build cookbook for all platforms in HTML, will be available in ./build"
+	@echo "make test        Test cookbook for all platforms."
+	@echo "make py          Build the Cookbook for Python only."
+	@echo "make r           Build the Cookbook for R only."
+	@echo "make pytest      Verify the cookbook for Python only."
+	@echo "make rtest       Verify the cookbook for R only."
+
+
+pydeps:
+	@echo ">>> Installing Python Dependencies <<<\n"
+	cd python && pip install -r requirements.txt
+
+
+py: pydeps
+	@echo ">>> Building Python Cookbook <<<\n"
+	cd python && make html
+	mkdir -p build/py
+	cp -r python/build/html/* build/py
+
+
+pytest: pydeps
+	@echo ">>> Testing Python Cookbook <<<\n"
+	cd python && make doctest
+
+rdeps:
+	@echo ">>> Installing R Dependencies <<<\n"
+ifdef arrow_r_version
+	cd ./r && Rscript ./scripts/install_dependencies.R $(arrow_r_version)
+else
+	cd ./r && Rscript ./scripts/install_dependencies.R
+endif
+
+
+r: rdeps
+	@echo ">>> Building R Cookbook <<<\n"
+	R -s -e 'bookdown::render_book("./r/content", output_format = "bookdown::gitbook")'
+	mkdir -p build/r
+	cp -r r/content/_book/* build/r
+
+rtest: rdeps
+	@echo ">>> Testing R Cookbook <<<\n"
+	cd ./r && Rscript ./scripts/test.R
+
+
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..193736c
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,60 @@
+Apache Arrow Cookbooks
+======================
+
+Cookbooks are a collection of recipes about common tasks
+that Arrow users might want to do. The cookbook is actually
+composed of multiple cookbooks, one for each supported platform,
+that contain the recipes for that specific platform.
+
+All cookbooks are buildable to HTML and verifiable by running
+a set of tests that confirm that the recipes are still working
+as expected.
+
+Each cookbook is implemented using platform specific tools.
+For this reason a Makefile is provided which abstracts platform
+specific concerns and makes it possible to build/test all cookbooks
+without any platform specific knowledge (as long as dependencies
+are available on the target system).
+
+Building All Cookbooks
+----------------------
+
+``make all``
+
+Testing All Cookbooks
+---------------------
+
+``make test``
+
+Listing Available Commands
+--------------------------
+
+``make help``
+
+Building Platform Specific Cookbook
+-----------------------------------
+
+Refer to ``make help`` to learn the
+commands that build or test the cookbook for the platform you
+are targeting.
+
+Prerequisites
+=============
+
+Both the R and Python cookbooks will try to install the
+dependencies they need (including latests pyarrow/arrow-R version).
+This means that as far as you have a working Python/R environment
+able to install dependencies through the respective package manager
+you shouldn't need to install anything manually.
+
+Contributing to the Cookbook
+============================
+
+Please refer to the `CONTRIBUTING.md <CONTRIBUTING.md>`_ file
+for instructions about how to contribute to the Apache Arrow Cookbook.
+
+------------------------------------------------------------------------
+
+All participation in the Apache Arrow project is governed by the Apache
+Software Foundation’s 
+`code of conduct <https://www.apache.org/foundation/policies/conduct.html>`_.
diff --git a/build/arrow.png b/build/arrow.png
new file mode 100644
index 0000000..72104b0
Binary files /dev/null and b/build/arrow.png differ
diff --git a/build/index.html b/build/index.html
new file mode 100644
index 0000000..d5ee952
--- /dev/null
+++ b/build/index.html
@@ -0,0 +1,49 @@
+<!DOCTYPE html>
+
+<html>
+	<head>
+		<title>Apache Arrow Cookbook</title>
+		<style>
+			body {
+				color: rgb(51, 51, 51);
+				font-family: sans-serif;
+				line-height: 1.65;
+				padding: 25px;
+				max-width: 900px;
+				margin-left: auto;
+				margin-right: auto;
+			}
+
+			a {
+				color: rgb(0, 91, 129);
+			}
+
+			#logo {
+				width: 50%;
+				margin-left: auto;
+				margin-right: auto;
+			}
+
+			#logo > img {
+				width: 100%;
+			}
+		</style>
+	</head>
+	<body>
+		<div id="logo"><img src="arrow.png"/></div>
+		<h1>Apache Arrow Cookbook</h1>
+		<p>The cookbook is a collection of Apache Arrow recipes for
+		   the languages and platforms supported by Arrow.<br/>
+		   Most recipes will be common to all platforms, 
+		   but some are specific to the language and environment in use.
+		</p>
+		<ul>
+			<li><a href="py/index.html">Python Cookbook</a></li>
+			<li><a href="r/index.html">R Cookbook</a></li>
+		</ul>
+		<p>If you are looking for the Apache Arrow Documentation itself
+		   or the API reference, those are available at
+		   <a href="https://arrow.apache.org/docs/">https://arrow.apache.org/docs/</a>
+		</p>
+	</body>
+</html>
diff --git a/python/CONTRIBUTING.rst b/python/CONTRIBUTING.rst
new file mode 100644
index 0000000..4b737f3
--- /dev/null
+++ b/python/CONTRIBUTING.rst
@@ -0,0 +1,75 @@
+Bulding the Python Cookbook
+===========================
+
+The python cookbook uses the Sphinx documentation system.
+
+Running ``make py`` from the cookbook root directory (the one where
+the ``README.rst`` exists) will install all necessary dependencies
+and will compile the cookbook to HTML.
+
+You will see the compiled result inside the ``build/py`` directory.
+
+Testing Python Recipes
+======================
+
+All recipes in the cookbook must be tested. The cookbook uses
+``doctest`` to verify the recipes.
+
+Running ``make pytest`` from the cookbook root directory
+will verify that the code for all the recipes runs correctly
+and provides the expected output.
+
+Adding Python Recipes
+=====================
+
+The recipes are written in **reStructuredText** format using 
+the `Sphinx <https://www.sphinx-doc.org/>`_ documentation system.
+
+New recipes can be added to one of the existing ``.rst`` files if
+they suit that section or you can create new sections by adding
+additional ``.rst`` files in the ``source`` directory. You just
+need to remember to add them to the ``index.rst`` file in the
+``toctree`` for them to become visible.
+
+The only requirement for recipes is that each code block in the recipe 
+must be written using the ``.. testcode::`` directive, 
+so that it can get tested.
+
+If the code block changes, alters or creates data, the recipe should
+``print`` the data to show how it changed and have a ``.. testoutput::``
+directive to confirm that the printed data is the expected one.
+
+For example a new recipe about how to create an Arrow array
+might look like:
+
+.. code-block::
+
+    You can create a new :class:`pyarrow.Array` providing the
+    data for the array through the :meth:`pyarrow.array` factory function
+
+    .. testcode::
+
+        import pyarrow as pa
+        array = pa.array(range(5))
+        print(array)
+
+    .. testoutput::
+
+        [
+          0,
+          1,
+          2,
+          3,
+          4
+        ]
+
+If you refer to any ``pyarrow`` class, function or method using
+``:class:``, ``:meth:`` or ``:func:`` directives a link to their
+documentation in the pyarrow API reference will be automatically
+created.
+
+------------------------------------------------------------------------
+
+All participation in the Apache Arrow project is governed by the Apache
+Software Foundation’s 
+`code of conduct <https://www.apache.org/foundation/policies/conduct.html>`_.
diff --git a/python/Makefile b/python/Makefile
new file mode 100644
index 0000000..a023ec1
--- /dev/null
+++ b/python/Makefile
@@ -0,0 +1,20 @@
+# Minimal makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line, and also
+# from the environment for the first two.
+SPHINXOPTS    ?= -jauto
+SPHINXBUILD   ?= sphinx-build
+SOURCEDIR     = source
+BUILDDIR      = build
+
+# Put it first so that "make" without argument is like "make help".
+help:
+	@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
+
+.PHONY: help Makefile
+
+# Catch-all target: route all unknown targets to Sphinx using the new
+# "make mode" option.  $(O) is meant as a shortcut for $(SPHINXOPTS).
+%: Makefile
+	@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
diff --git a/python/make.bat b/python/make.bat
new file mode 100644
index 0000000..6247f7e
--- /dev/null
+++ b/python/make.bat
@@ -0,0 +1,35 @@
+@ECHO OFF
+
+pushd %~dp0
+
+REM Command file for Sphinx documentation
+
+if "%SPHINXBUILD%" == "" (
+	set SPHINXBUILD=sphinx-build
+)
+set SOURCEDIR=source
+set BUILDDIR=build
+
+if "%1" == "" goto help
+
+%SPHINXBUILD% >NUL 2>NUL
+if errorlevel 9009 (
+	echo.
+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
+	echo.installed, then set the SPHINXBUILD environment variable to point
+	echo.to the full path of the 'sphinx-build' executable. Alternatively you
+	echo.may add the Sphinx directory to PATH.
+	echo.
+	echo.If you don't have Sphinx installed, grab it from
+	echo.http://sphinx-doc.org/
+	exit /b 1
+)
+
+%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+goto end
+
+:help
+%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
+
+:end
+popd
diff --git a/python/requirements.txt b/python/requirements.txt
new file mode 100644
index 0000000..4167b52
--- /dev/null
+++ b/python/requirements.txt
@@ -0,0 +1,3 @@
+Sphinx>=4.0.2
+pyarrow>=4.0.0
+pandas>=1.2.5
diff --git a/python/source/conf.py b/python/source/conf.py
new file mode 100644
index 0000000..7f7537a
--- /dev/null
+++ b/python/source/conf.py
@@ -0,0 +1,57 @@
+# Configuration file for the Sphinx documentation builder.
+#
+# This file only contains a selection of the most common options. For a full
+# list see the documentation:
+# https://www.sphinx-doc.org/en/master/usage/configuration.html
+
+# -- Path setup --------------------------------------------------------------
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+#
+# import os
+# import sys
+# sys.path.insert(0, os.path.abspath('.'))
+
+
+# -- Project information -----------------------------------------------------
+
+project = 'Apache Arrow Python Cookbook'
+copyright = '2021, Apache Software Foundation'
+author = 'The Apache Software Foundation'
+
+
+# -- General configuration ---------------------------------------------------
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    "sphinx.ext.doctest",
+    "sphinx.ext.intersphinx"
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This pattern also affects html_static_path and html_extra_path.
+exclude_patterns = []
+
+
+# -- Options for HTML output -------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'alabaster'
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+
+intersphinx_mapping = {'pyarrow': ('https://arrow.apache.org/docs/', None)}
diff --git a/python/source/create.rst b/python/source/create.rst
new file mode 100644
index 0000000..e52c058
--- /dev/null
+++ b/python/source/create.rst
@@ -0,0 +1,111 @@
+======================
+Creating Arrow Objects
+======================
+
+Recipes related to the creation of Arrays, Tables,
+Tensors and all other Arrow entities.
+
+.. contents::
+
+Create Table from Plain Types
+=============================
+
+Arrow allows fast zero copy creation of arrow arrays
+from numpy and pandas arrays and series, but it's also
+possible to create Arrow Arrays and Tables from 
+plain Python structures.
+
+the :func:`pyarrow.table` function allows creation of Tables
+from a variety of inputs, including plain python objects
+
+.. testcode::
+
+    import pyarrow as pa
+
+    table = pa.table({
+        "col1": [1, 2, 3, 4, 5],
+        "col2": ["a", "b", "c", "d", "e"]
+    })
+
+    print(table)
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    col2: string
+
+.. note::
+
+    All values provided in the dictionary will be passed to
+    :func:`pyarrow.array` for conversion to Arrow arrays,
+    and will benefit from zero copy behaviour when possible.
+
+Store Categorical Data
+======================
+
+Arrow provides the :class:`pyarrow.DictionaryArray` type
+to represent categorical data without the cost of
+storing and repeating the categories over and over.  This can reduce memory use
+when columns might have large values (such as text).
+
+If you have an array containing repeated categorical data,
+it is possible to convert it to a :class:`pyarrow.DictionaryArray`
+using :meth:`pyarrow.Array.dictionary_encode`
+
+.. testcode::
+
+    arr = pa.array(["red", "green", "blue", "blue", "green", "red"])
+
+    categorical = arr.dictionary_encode()
+    print(categorical)
+
+.. testoutput::
+
+    ...
+    -- dictionary:
+      [
+        "red",
+        "green",
+        "blue"
+      ]
+    -- indices:
+      [
+        0,
+        1,
+        2,
+        2,
+        1,
+        0
+      ]
+
+If you already know the categories and indices then you can skip the encode
+step and directly create the ``DictionaryArray`` using 
+:meth:`pyarrow.DictionaryArray.from_arrays`
+
+.. testcode::
+
+    categorical = pa.DictionaryArray.from_arrays(
+        indices=[0, 1, 2, 2, 1, 0],
+        dictionary=["red", "green", "blue"]
+    )
+    print(categorical)
+
+.. testoutput::
+
+    ...
+    -- dictionary:
+      [
+        "red",
+        "green",
+        "blue"
+      ]
+    -- indices:
+      [
+        0,
+        1,
+        2,
+        2,
+        1,
+        0
+      ]
diff --git a/python/source/data.rst b/python/source/data.rst
new file mode 100644
index 0000000..527181d
--- /dev/null
+++ b/python/source/data.rst
@@ -0,0 +1,139 @@
+=================
+Data Manipulation
+=================
+
+Recipes related to filtering or transforming data in
+arrays and tables.
+
+.. contents::
+
+See :ref:`compute` for a complete list of all available compute functions
+
+Computing Mean/Min/Max values of an array
+=========================================
+
+Arrow provides compute functions that can be applied to arrays.
+Those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import numpy as np
+  import pyarrow as pa
+
+  arr = pa.array(np.arange(100))
+
+Given an array with 100 numbers, from 0 to 99
+
+.. testcode::
+
+  print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+  0 .. 99
+
+We can compute the ``mean`` using the :func:`arrow.compute.mean`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  mean = pc.mean(arr)
+  print(mean)
+
+.. testoutput::
+
+  49.5
+
+And the ``min`` and ``max`` using the :func:`arrow.compute.min_max`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  min_max = pc.min_max(arr)
+  print(min_max)
+
+.. testoutput::
+
+  [('min', 0), ('max', 99)]
+
+Counting Occurrences of Elements
+================================
+
+Arrow provides compute functions that can be applied to arrays,
+those compute functions are exposed through the :mod:`arrow.compute`
+module.
+
+.. testsetup::
+
+  import pyarrow as pa
+
+  nums_arr = pa.array(list(range(10))*10)
+
+Given an array with all numbers from 0 to 9 repeated 10 times
+
+.. testcode::
+
+  print(f"LEN: {len(nums_arr)}, MIN/MAX: {nums_arr[0]} .. {nums_arr[-1]}")
+
+.. testoutput::
+
+  LEN: 100, MIN/MAX: 0 .. 9
+
+We can count occurences of all entries in the array using the
+:func:`arrow.compute.value_counts` function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  counts = pc.value_counts(nums_arr)
+  for pair in counts:
+      print(pair)
+
+.. testoutput::
+
+  [('values', 0), ('counts', 10)]
+  [('values', 1), ('counts', 10)]
+  [('values', 2), ('counts', 10)]
+  [('values', 3), ('counts', 10)]
+  [('values', 4), ('counts', 10)]
+  [('values', 5), ('counts', 10)]
+  [('values', 6), ('counts', 10)]
+  [('values', 7), ('counts', 10)]
+  [('values', 8), ('counts', 10)]
+  [('values', 9), ('counts', 10)]
+
+Applying arithmetic functions to arrays.
+=========================================
+
+The compute functions in :mod:`arrow.compute` also include
+common transformations such as arithmetic functions.
+
+Given an array with 100 numbers, from 0 to 99
+
+.. testcode::
+
+  print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+  0 .. 99
+
+We can multiply all values by 2 using the :func:`arrow.compute.multiply`
+function
+
+.. testcode::
+
+  import pyarrow.compute as pc
+
+  doubles = pc.multiply(arr, 2)
+  print(f"{doubles[0]} .. {doubles[-1]}")
+
+.. testoutput::
+
+  0 .. 198
diff --git a/python/source/index.rst b/python/source/index.rst
new file mode 100644
index 0000000..ff4ae7c
--- /dev/null
+++ b/python/source/index.rst
@@ -0,0 +1,27 @@
+.. Apache Arrow Cookbook documentation master file, created by
+   sphinx-quickstart on Wed Jun 16 10:33:09 2021.
+   You can adapt this file completely to your liking, but it should at least
+   contain the root `toctree` directive.
+
+Apache Arrow Python Cookbook
+============================
+
+The Apache Arrow Cookbook is a collection of recipes which demonstrate
+how to solve many common tasks that users might need to perform
+when working with arrow data.  The examples in this cookbook will also
+serve as robust and well performing solutions to those tasks.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   io
+   create
+   data
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
diff --git a/python/source/io.rst b/python/source/io.rst
new file mode 100644
index 0000000..4edc444
--- /dev/null
+++ b/python/source/io.rst
@@ -0,0 +1,425 @@
+========================
+Reading and Writing Data
+========================
+
+Recipes related to reading and writing data from disk using
+Apache Arrow.
+
+.. contents::
+
+Write a Parquet file
+====================
+
+.. testsetup::
+
+    import numpy as np
+    import pyarrow as pa
+
+    arr = pa.array(np.arange(100))
+
+Given an array with 100 numbers, from 0 to 99
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+To write it to a Parquet file, 
+as Parquet is a format that contains multiple named columns,
+we must create a :class:`pyarrow.Table` out of it,
+so that we get a table of a single column which can then be
+written to a Parquet file. 
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+
+Once we have a table, it can be written to a Parquet File 
+using the functions provided by the ``pyarrow.parquet`` module
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    pq.write_table(table, "example.parquet", compression=None)
+
+Reading a Parquet file
+======================
+
+Given a Parquet file, it can be read back to a :class:`pyarrow.Table`
+by using :func:`pyarrow.parquet.read_table` function
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet")
+
+The resulting table will contain the same columns that existed in
+the parquet file as :class:`ChunkedArray`
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99
+
+Reading a subset of Parquet data
+================================
+
+When reading a Parquet file with :func:`pyarrow.parquet.read_table` 
+it is possible to restrict which Columns and Rows will be read
+into memory by using the ``filters`` and ``columns`` arguments
+
+.. testcode::
+
+    import pyarrow.parquet as pq
+
+    table = pq.read_table("example.parquet", 
+                          columns=["col1"],
+                          filters=[
+                              ("col1", ">", 5),
+                              ("col1", "<", 10),
+                          ])
+
+The resulting table will contain only the projected columns
+and filtered rows. Refer to :func:`pyarrow.parquet.read_table`
+documentation for details about the syntax for filters.
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 6 .. 9
+    
+
+Saving Arrow Arrays to disk
+===========================
+
+Apart from using arrow to read and save common file formats like Parquet,
+it is possible to dump data in the raw arrow format which allows 
+direct memory mapping of data from disk. This format is called
+the Arrow IPC format.
+
+Given an array with 100 numbers, from 0 to 99
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+We can save the array by making a :class:`pyarrow.RecordBatch` out
+of it and writing the record batch to disk.
+
+.. testcode::
+
+    schema = pa.schema([
+        pa.field('nums', arr.type)
+    ])
+
+    with pa.OSFile('arraydata.arrow', 'wb') as sink:
+        with pa.ipc.new_file(sink, schema=schema) as writer:
+            batch = pa.record_batch([arr], schema=schema)
+            writer.write(batch)
+
+If we were to save multiple arrays into the same file,
+we would just have to adapt the ``schema`` accordingly and add
+them all to the ``record_batch`` call.
+
+Memory Mapping Arrow Arrays from disk
+=====================================
+
+Arrow arrays that have been written to disk in the Arrow IPC
+format can be memory mapped back directly from the disk.
+
+.. testcode::
+
+    with pa.memory_map('arraydata.arrow', 'r') as source:
+        loaded_arrays = pa.ipc.open_file(source).read_all()
+
+.. testcode::
+
+    arr = loaded_arrays[0]
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+Writing CSV files
+=================
+
+It is currently possible to write an Arrow :class:`pyarrow.Table` to
+CSV by going through pandas. Arrow doesn't currently provide an optimized
+code path for writing to CSV.
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+    table.to_pandas().to_csv("table.csv", index=False)
+
+Reading CSV files
+=================
+
+Arrow can read :class:`pyarrow.Table` entities from CSV using an
+optimized codepath that can leverage multiple threads.
+
+.. testcode::
+
+    import pyarrow.csv
+
+    table = pa.csv.read_csv("table.csv")
+
+Arrow will do its best to infer data types.  Further options can be
+provided to :func:`pyarrow.csv.read_csv` to drive
+:class:`pyarrow.csv.ConvertOptions`.
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99
+
+Reading Partitioned data
+========================
+
+In some cases, your dataset might be composed by multiple separate
+files each containing a piece of the data. 
+
+.. testsetup::
+
+    import pathlib
+    import pyarrow.parquet as pq
+
+    examples = pathlib.Path("examples")
+    examples.mkdir(exist_ok=True)
+
+    pq.write_table(pa.table({"col1": range(10)}), 
+                   examples / "dataset1.parquet", compression=None)
+    pq.write_table(pa.table({"col1": range(10, 20)}), 
+                   examples / "dataset2.parquet", compression=None)
+    pq.write_table(pa.table({"col1": range(20, 30)}), 
+                   examples / "dataset3.parquet", compression=None)
+
+In this case the :func:`pyarrow.dataset.dataset` function provides
+an interface to discover and read all those files as a single big dataset.
+
+For example if we have a structure like:
+
+.. code-block::
+
+    examples/
+    ├── dataset1.parquet
+    ├── dataset2.parquet
+    └── dataset3.parquet
+
+Then, pointing the :func:`pyarrow.dataset.dataset` function to the ``examples`` directory
+will discover those parquet files and will expose them all as a single
+:class:`pyarrow.dataset.Dataset`:
+
+.. testcode::
+
+    import pyarrow.dataset as ds
+
+    dataset = ds.dataset("./examples", format="parquet")
+    print(dataset.files)
+
+.. testoutput::
+
+    ['./examples/dataset1.parquet', './examples/dataset2.parquet', './examples/dataset3.parquet']
+
+The whole dataset can be viewed as a single big table using
+:meth:`pyarrow.dataset.Dataset.to_table`. While each parquet file
+contains only 10 rows, converting the dataset to a table will
+expose them as a single Table.
+
+.. testcode::
+
+    table = dataset.to_table()
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 29
+
+Notice that converting to a table will force all data to be loaded 
+in memory.  For big datasets is usually not what you want.
+
+For this reason, it might be better to rely on the 
+:meth:`pyarrow.dataset.Dataset.to_batches` method, which will
+iteratively load the dataset one chunk of data at the time returning a 
+:class:`pyarrow.RecordBatch` for each one of them.
+
+.. testcode::
+
+    for record_batch in dataset.to_batches():
+        col1 = record_batch.column("col1")
+        print(f"{col1._name} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    col1 = 0 .. 9
+    col1 = 10 .. 19
+    col1 = 20 .. 29
+
+Reading Partitioned Data from S3
+================================
+
+The :class:`pyarrow.dataset.Dataset` is also able to abstract
+partitioned data coming from remote sources like S3 or HDFS.
+
+.. testcode::
+
+    from pyarrow import fs
+
+    # List content of s3://ursa-labs-taxi-data/2011
+    s3 = fs.SubTreeFileSystem("ursa-labs-taxi-data", fs.S3FileSystem(region="us-east-2"))
+    for entry in s3.get_file_info(fs.FileSelector("2011", recursive=True)):
+        if entry.type == fs.FileType.File:
+            print(entry.path)
+
+.. testoutput::
+
+    2011/01/data.parquet
+    2011/02/data.parquet
+    2011/03/data.parquet
+    2011/04/data.parquet
+    2011/05/data.parquet
+    2011/06/data.parquet
+    2011/07/data.parquet
+    2011/08/data.parquet
+    2011/09/data.parquet
+    2011/10/data.parquet
+    2011/11/data.parquet
+    2011/12/data.parquet
+
+The data in the bucket can be loaded as a single big dataset partitioned
+by ``month`` using
+
+.. testcode::
+
+    dataset = ds.dataset("s3://ursa-labs-taxi-data/2011", 
+                         partitioning=["month"])
+    for f in dataset.files[:10]:
+        print(f)
+    print("...")
+
+.. testoutput::
+
+    ursa-labs-taxi-data/2011/01/data.parquet
+    ursa-labs-taxi-data/2011/02/data.parquet
+    ursa-labs-taxi-data/2011/03/data.parquet
+    ursa-labs-taxi-data/2011/04/data.parquet
+    ursa-labs-taxi-data/2011/05/data.parquet
+    ursa-labs-taxi-data/2011/06/data.parquet
+    ursa-labs-taxi-data/2011/07/data.parquet
+    ursa-labs-taxi-data/2011/08/data.parquet
+    ursa-labs-taxi-data/2011/09/data.parquet
+    ursa-labs-taxi-data/2011/10/data.parquet
+    ...
+
+The dataset can then be used with :meth:`pyarrow.dataset.Dataset.to_table`
+or :meth:`pyarrow.dataset.Dataset.to_batches` like you would for a local one.
+
+.. note::
+
+    It is possible to load partitioned data also in the ipc arrow
+    format or in feather format.
+
+Write a Feather file
+====================
+
+.. testsetup::
+
+    import numpy as np
+    import pyarrow as pa
+
+    arr = pa.array(np.arange(100))
+
+Given an array with 100 numbers, from 0 to 99
+
+.. testcode::
+
+    print(f"{arr[0]} .. {arr[-1]}")
+
+.. testoutput::
+
+    0 .. 99
+
+To write it to a Feather file, as Feather stores multiple columns,
+we must create a :class:`pyarrow.Table` out of it,
+so that we get a table of a single column which can then be
+written to a Feather file. 
+
+.. testcode::
+
+    table = pa.Table.from_arrays([arr], names=["col1"])
+
+Once we have a table, it can be written to a Feather File 
+using the functions provided by the ``pyarrow.feather`` module
+
+.. testcode::
+
+    import pyarrow.feather as ft
+    
+    ft.write_feather(table, 'example.feather')
+
+Reading a Feather file
+======================
+
+Given a Feather file, it can be read back to a :class:`pyarrow.Table`
+by using :func:`pyarrow.feather.read_table` function
+
+.. testcode::
+
+    import pyarrow.feather as ft
+
+    table = ft.read_table("example.feather")
+
+The resulting table will contain the same columns that existed in
+the parquet file as :class:`ChunkedArray`
+
+.. testcode::
+
+    print(table)
+
+    col1 = table["col1"]
+    print(f"{type(col1).__name__} = {col1[0]} .. {col1[-1]}")
+
+.. testoutput::
+
+    pyarrow.Table
+    col1: int64
+    ChunkedArray = 0 .. 99
diff --git a/r/.Rbuildignore b/r/.Rbuildignore
new file mode 100644
index 0000000..c503c4f
--- /dev/null
+++ b/r/.Rbuildignore
@@ -0,0 +1 @@
+^\.github$
diff --git a/r/CONTRIBUTING.md b/r/CONTRIBUTING.md
new file mode 100644
index 0000000..dde26ef
--- /dev/null
+++ b/r/CONTRIBUTING.md
@@ -0,0 +1,53 @@
+# Contributing
+
+We actively welcome contributions to the Arrow R cookbook!  If you want to make a contribution, please fork this repo and make a pull request with your changes.  If you see any errors or have suggestions for recipes you'd like to see but do not know how to create, please open a GitHub issue.
+
+# Adding R Recipes
+
+The recipes are written in RMarkdown format using `bookdown`.
+
+You can add new recipes to one of the existing ``.Rmd`` files, or you can create new sections by adding additional ``.Rmd`` files in the `content` directory.  If you add a new file, you should add it to the `rmd_files` list in `content/_bookdown.yml` for it to be visible in the rendered cookbook.
+
+After each code chunk in the recipe, you should add a test chunk that tests that the code chunk's output is as expected.  Using a test chunk will allow your recipe to be tested against the latest version of arrow, and make it easier to detect if any changes made to arrow result in your recipe becoming out-of-date.
+
+Each significant code chunk must be given a descriptive label, and be immediately followed by a unit test of its output.  This test should be labelled "test_" followed by the name of the chunk that it is testing.  The test chunk should also have the `opts.label` attribute set to "test" - this will ensure that the test is not rendered as part of the cookbook.
+
+Here's an example of a recipe and a test:
+
+~~~
+```{r, write_parquet}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+
+# Write to Parquet
+write_parquet(my_table, "my_table.parquet")
+```
+
+```{r, test_write_parquet, opts.label = "test"}
+test_that("write_parquet chunk works as expected", {
+  expect_true(file.exists("my_table.parquet"))
+})
+```
+~~~
+
+# Testing R Recipes
+
+All recipes in the cookbook must be tested. The cookbook uses `testthat` to verify the recipes.
+
+Running ``make rtest`` from the cookbook root directory will verify that the code for all of the R recipes run correctly and provide the expected output.
+
+# Building the Arrow R Cookbook
+
+The Arrow R cookbook has been written using `bookdown`.
+
+Running ``make r`` from the cookbook root directory (the one where the ``Makefile`` exists) will install all necessary dependencies (including the latest nightly build of the Arrow R package) and compile the cookbook to HTML.
+
+You can see the compiled result inside the ``build/r`` directory.
+
+If you add a new recipe to the cookbook, you do not need to commit changes to `build/r` to the repo, as this is automatically run by our CI when building the latest version of the cookbook on the main branch.
+
+------------------------------------------------------------------------
+
+All participation in the Apache Arrow project is governed by the Apache
+Software Foundation’s [code of
+conduct](https://www.apache.org/foundation/policies/conduct.html).
diff --git a/r/content/_bookdown.yml b/r/content/_bookdown.yml
new file mode 100644
index 0000000..f2eb993
--- /dev/null
+++ b/r/content/_bookdown.yml
@@ -0,0 +1,11 @@
+delete_merged_file: TRUE
+# need this option to run all chunks in 1 session
+new_session: FALSE
+clean: ["_book/*"]
+output_dir: _book
+edit: https://github.com/ursacomputing/arrow-cookbook/edit/master/r/content/%s
+rmd_files: ["index.Rmd", "reading_and_writing_data.Rmd", "creating_arrow_objects.Rmd", "manipulating_data.Rmd"]
+
+# This is the full list
+# rmd_files: ["index.Rmd", "configure_arrow.Rmd", "work_with_data_in_different_formats.Rmd",
+# "work_with_compressed_or_partitioned_data.Rmd", "create_arrow_objects_from_r.Rmd", "specify_data_types_and_schemas.Rmd", "manipulate_data.Rmd", "work_with_arrow_in_both_python_and_r.Rmd"]
diff --git a/r/content/creating_arrow_objects.Rmd b/r/content/creating_arrow_objects.Rmd
new file mode 100644
index 0000000..a1e6c8c
--- /dev/null
+++ b/r/content/creating_arrow_objects.Rmd
@@ -0,0 +1,85 @@
+# Creating Arrow Objects
+
+## Build an Arrow Table from native language types
+
+### Manually create a Table from an R object
+
+You may want to convert an existing data frame in R to an Arrow Table object.
+
+```{r, table_create}
+# Create an example data frame
+my_tibble <- tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99))
+# Convert to Arrow Table
+my_table <- Table$create(my_tibble)
+# View table
+my_table
+```
+```{r, test_table_create, opts.label = "test"}
+test_that("table_create works as expected", {
+  expect_s3_class(my_table, "Table")
+  expect_identical(dplyr::collect(my_table), my_tibble)
+})
+```
+#### View the contents of an Arrow Table
+
+You can view the contents of an Arrow Table using `dplyr::collect()`
+
+```{r, table_collect}
+# View Table
+dplyr::collect(my_table)
+```
+```{r, test_table_collect, opts.label = "test"}
+test_that("table_collect works as expected", {
+  expect_identical(dplyr::collect(my_table), my_tibble)
+})
+```
+
+### Manually create a RecordBatch
+
+You may want to convert an existing data frame in R to an Arrow RecordBatch object.
+
+```{r, record_batch_create}
+# Create an example data frame
+my_tibble <- tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99))
+# Convert to Arrow RecordBatch
+my_record_batch <- record_batch(my_tibble)
+# View RecordBatch
+my_record_batch
+```
+```{r, test_record_batch_create, opts.label = "test"}
+test_that("record_batch_create works as expected", {
+  expect_s3_class(my_record_batch, "RecordBatch")
+  expect_identical(dplyr::collect(my_record_batch), my_tibble)
+})
+```
+#### View the contents of a RecordBatch
+
+You can view the contents of a RecordBatch using `dplyr::collect()`
+
+```{r, rb_collect}
+# View RecordBatch
+dplyr::collect(my_record_batch)
+```
+```{r, test_rb_collect, opts.label = "test"}
+test_that("rb_collect works as expected", {
+  expect_identical(dplyr::collect(my_record_batch), my_tibble)
+})
+```
+
+## Storing Categorical Data in Arrow
+
+An Arrow Dictionary object is similar to a factor in R, in that it allows for efficient storage of categorical data by allowing you to map between indices and values, reducing the amount of storage.  If you have an R data frame containing factors, converting it to an Arrow object will automatically encode that column as a dictionary.
+
+```{r}
+class(iris$Species)
+```
+
+```{r, dictionary}
+iris_rb <- record_batch(iris)
+iris_rb
+```
+```{r, test_dictionary, opts.label = "test"}
+test_that("dictionary works as expected", {
+  expect_s3_class(iris_rb$Species, "DictionaryArray")
+})
+```
diff --git a/r/content/index.Rmd b/r/content/index.Rmd
new file mode 100644
index 0000000..db3b5c8
--- /dev/null
+++ b/r/content/index.Rmd
@@ -0,0 +1,21 @@
+---
+title: "Arrow Cookbook"
+params:
+  inline_test_output: FALSE
+---
+
+```{r setup, include = FALSE}
+testrmd::init()
+library(arrow)
+library(testthat)
+library(dplyr)
+# Include test 
+knitr::opts_template$set(test = list(
+  test = TRUE,
+  eval = params$inline_test_output
+))
+```
+
+# Preface
+
+This cookbook aims to provide a number of recipes showing how to perform common tasks using `arrow`.
diff --git a/r/content/manipulating_data.Rmd b/r/content/manipulating_data.Rmd
new file mode 100644
index 0000000..fa9e440
--- /dev/null
+++ b/r/content/manipulating_data.Rmd
@@ -0,0 +1,75 @@
+# Manipulating Data
+
+## Computing Mean/Min/Max, etc value of an Array
+
+Many base R generic functions such as `mean()`, `min()`, and `max()` have been mapped to their Arrow equivalents, and so can be called on Arrow Array objects in the same way. They will return Arrow objects themselves.
+
+```{r, array_mean_na}
+my_values <- Array$create(c(1:5, NA))
+mean(my_values, na.rm = TRUE)
+```
+```{r, test_array_mean_na, opts.label = "test"}
+test_that("array_mean_na works as expected", {
+  expect_equal(mean(my_values, na.rm = TRUE), Scalar$create(3))
+})
+```
+If you want to use an R function which does not have an Arrow mapping, you can use `as.vector()` to convert Arrow objects to base R vectors.
+
+```{r, fivenum}
+fivenum(as.vector(my_values))
+```
+```{r, test_fivenum, opts.label = "test"}
+test_that("fivenum works as expected", {
+  expect_equal(fivenum(as.vector(my_values)), 1:5)
+})
+```
+
+## Counting occurrences of elements in an Array
+
+Some functions in the Arrow R package do not have base R equivalents. In other cases, the base R equivalents are not generic functions so they cannot be called directly on Arrow Array objects.
+
+For example, the `value_count()` function in the Arrow R package is loosely equivalent to the base R function `table()`, which is not a generic function. To count the elements in an R vector, you can use `table()`; to count the elements in an Arrow Array, you can use `value_count()`.
+
+```{r, value_counts}
+repeated_vals <- Array$create(c(1, 1, 2, 3, 3, 3, 3, 3))
+value_counts(repeated_vals)
+```
+
+```{r, test_value_counts, opts.label = "test"}
+test_that("value_counts works as expected", {
+  expect_equal(
+    as.vector(value_counts(repeated_vals)),
+    tibble(
+      values = as.numeric(names(table(as.vector(repeated_vals)))),
+      counts = as.vector(table(as.vector(repeated_vals)))
+    )
+  )
+})
+```
+
+## Applying arithmetic functions to Arrays.
+
+You can use the various arithmetic operators on Array objects.
+
+```{r, add_array}
+num_array <- Array$create(1:10)
+num_array + 10
+```
+```{r, test_add_array, opts.label = "test"}
+test_that("add_array works as expected", {
+  # need to specify expected array as 1:10 + 10 instead of 11:20 so is double not integer
+  expect_equal(num_array + 10, Array$create(1:10 + 10))
+})
+```
+
+You will get the same result if you pass in the value you're adding as an Arrow object.
+
+```{r, add_array_scalar}
+num_array + Scalar$create(10)
+```
+```{r, test_add_array_scalar, opts.label = "test"}
+test_that("add_array_scalar works as expected", {
+  # need to specify expected array as 1:10 + 10 instead of 11:20 so is double not integer
+  expect_equal(num_array + Scalar$create(10), Array$create(1:10 + 10))
+})
+```
diff --git a/r/content/reading_and_writing_data.Rmd b/r/content/reading_and_writing_data.Rmd
new file mode 100644
index 0000000..66aa6a8
--- /dev/null
+++ b/r/content/reading_and_writing_data.Rmd
@@ -0,0 +1,288 @@
+# Reading and Writing Data
+
+This chapter contains recipes related to reading and writing data using Apache Arrow.  When reading data using Apache Arrow, there are 2 different ways you may choose to read in the data:
+1. a `tibble`
+2. an Arrow Table
+
+There are a number of circumstances in which you may want to read in the data as an Arrow Table:
+* your dataset is large and if you load it into memory, it may lead to performance issues
+* you want faster performance from your `dplyr` queries
+* you want to be able to take advantage of Arrow's compute functions
+
+## Converting from a tibble to an Arrow Table
+
+You can convert an existing `tibble` or `data.frame` into an Arrow Table.
+
+```{r, table_create}
+air_table <- Table$create(airquality)
+air_table
+```
+```{r, test_table_create, opts.label = "test"}
+test_that("table_create chunk works as expected", {
+  expect_s3_class(air_table, "Table")
+})
+```
+
+## Converting data from an Arrow Table to a tibble
+
+You may want to convert an Arrow Table to a tibble to view the data or work with it in your usual analytics pipeline.  You can use either `dplyr::collect()` or `as.data.frame()` to do this.
+
+```{r, collect_table}
+air_tibble <- dplyr::collect(air_table)
+air_tibble
+```
+```{r, test_collect_table, opts.label = "test"}
+test_that("collect_table chunk works as expected", {
+  expect_identical(air_tibble, airquality) 
+})
+```
+
+## Reading and Writing Parquet Files
+
+### Writing a Parquet file
+
+You can write Parquet files to disk using `arrow::write_parquet()`.
+```{r, write_parquet}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+# Write to Parquet
+write_parquet(my_table, "my_table.parquet")
+```
+```{r, test_write_parquet, opts.label = "test"}
+test_that("write_parquet chunk works as expected", {
+  expect_true(file.exists("my_table.parquet"))
+})
+```
+ 
+### Reading a Parquet file
+
+Given a Parquet file, it can be read back in by using `arrow::read_parquet()`.
+
+```{r, read_parquet}
+parquet_tbl <- read_parquet("my_table.parquet")
+head(parquet_tbl)
+```
+```{r, test_read_parquet, opts.label = "test"}
+test_that("read_parquet works as expected", {
+  expect_equivalent(dplyr::collect(parquet_tbl), tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+})
+```
+
+As the argument `as_data_frame` was left set to its default value of `TRUE`, the file was read in as a `data.frame` object.
+
+```{r, read_parquet_2}
+class(parquet_tbl)
+```
+```{r, test_read_parquet_2, opts.label = "test"}
+test_that("read_parquet_2 works as expected", {
+  expect_s3_class(parquet_tbl, "data.frame")
+})
+```
+If you set `as_data_frame` to `FALSE`, the file will be read in as an Arrow Table.
+
+```{r, read_parquet_table}
+my_table_arrow_table <- read_parquet("my_table.parquet", as_data_frame = FALSE)
+head(my_table_arrow_table)
+```
+
+```{r, read_parquet_table_class}
+class(my_table_arrow_table)
+```
+```{r, test_read_parquet_table_class, opts.label = "test"}
+test_that("read_parquet_table_class works as expected", {
+  expect_s3_class(my_table_arrow_table, "Table")
+})
+```
+
+### How to read a Parquet file from S3 
+
+You can open a Parquet file saved on S3 by calling `read_parquet()` and passing the relevant URI as the `file` argument.
+
+```{r, read_parquet_s3, eval = FALSE}
+df <- read_parquet(file = "s3://ursa-labs-taxi-data/2019/06/data.parquet")
+```
+For more in-depth instructions, including how to work with S3 buckets which require authentication, you can find a guide to reading and writing to/from S3 buckets here: https://arrow.apache.org/docs/r/articles/fs.html.
+
+### How to filter columns while reading a Parquet file 
+
+When reading in a Parquet file, you can specify which columns to read in via the `col_select` argument.
+
+```{r, read_parquet_filter}
+# Create table to read back in 
+dist_time <- Table$create(tibble::tibble(distance = c(12.2, 15.7, 14.2), time = c(43, 44, 40)))
+# Write to Parquet
+write_parquet(dist_time, "dist_time.parquet")
+
+# Read in only the "time" column
+time_only <- read_parquet("dist_time.parquet", col_select = "time")
+head(time_only)
+```
+```{r, test_read_parquet_filter, opts.label = "test"}
+test_that("read_parquet_filter works as expected", {
+  expect_identical(time_only, tibble::tibble(time = c(43, 44, 40)))
+})
+```
+
+## Reading and Writing Feather files 
+
+### Write an IPC/Feather V2 file
+
+The Arrow IPC file format is identical to the Feather version 2 format.  If you call `write_arrow()`, you will get a warning telling you to use `write_feather()` instead.
+
+```{r, write_arrow}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+write_arrow(my_table, "my_table.arrow")
+```
+```{r, test_write_arrow, opts.label = "test"}
+test_that("write_arrow chunk works as expected", {
+  expect_true(file.exists("my_table.arrow"))
+  expect_warning(
+    write_arrow(iris, "my_table.arrow"),
+    regexp = "Use 'write_ipc_stream' or 'write_feather' instead."
+  )
+})
+```
+
+Instead, you can use `write_feather()`.
+
+```{r, write_feather}
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+write_feather(my_table, "my_table.arrow")
+```
+```{r, test_write_feather, opts.label = "test"}
+test_that("write_feather chunk works as expected", {
+  expect_true(file.exists("my_table.arrow"))
+})
+```
+### Write a Feather (version 1) file
+
+For legacy support, you can write data in the original Feather format by setting the `version` parameter to `1`.
+
+```{r, write_feather1}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+# Write to Feather format V1
+write_feather(mtcars, "my_table.feather", version = 1)
+```
+```{r, test_write_feather1, opts.label = "test"}
+test_that("write_feather1 chunk works as expected", {
+  expect_true(file.exists("my_table.feather"))
+})
+```
+
+### Read a Feather file
+
+You can read Feather files in via `read_feather()`.
+
+```{r, read_feather}
+my_feather_tbl <- read_feather("my_table.arrow")
+```
+```{r, test_read_feather, opts.label = "test"}
+test_that("read_feather chunk works as expected", {
+  expect_identical(dplyr::collect(my_feather_tbl), tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+})
+```
+
+## Reading and Writing Streaming IPC Files
+
+You can write to the IPC stream format using `write_ipc_stream()`.
+
+```{r, write_ipc_stream}
+# Create table
+my_table <- Table$create(tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+# Write to IPC stream format
+write_ipc_stream(my_table, "my_table.arrows")
+```
+```{r, test_write_ipc_stream, opts.label = "test"}
+test_that("write_ipc_stream chunk works as expected", {
+  expect_true(file.exists("my_table.arrows"))
+})
+```
+You can read from IPC stream format using `read_ipc_stream()`.
+
+```{r, read_ipc_stream}
+my_ipc_stream <- arrow::read_ipc_stream("my_table.arrows")
+```
+```{r, test_read_ipc_stream, opts.label = "test"}
+test_that("read_ipc_stream chunk works as expected", {
+  expect_equal(my_ipc_stream, tibble::tibble(group = c("A", "B", "C"), score = c(99, 97, 99)))
+})
+```
+
+## Reading and Writing CSV files 
+
+You can use `write_csv_arrow()` to save an Arrow Table to disk as a CSV.
+
+```{r, write_csv_arrow}
+write_csv_arrow(cars, "cars.csv")
+```
+```{r, test_write_csv_arrow, opts.label = "test"}
+test_that("write_csv_arrow chunk works as expected", {
+  expect_true(file.exists("cars.csv"))
+})
+```
+
+You can use `read_csv_arrow()` to read in a CSV file as an Arrow Table.
+
+```{r, read_csv_arrow}
+my_csv <- read_csv_arrow("cars.csv", as_data_frame = FALSE)
+```
+
+```{r, test_read_csv_arrow, opts.label = "test"}
+test_that("read_csv_arrow chunk works as expected", {
+  expect_equivalent(dplyr::collect(my_csv), cars)
+})
+```
+
+## Reading and Writing Partitioned Data 
+
+### Writing Partitioned Data
+
+You can use `write_dataset()` to save data to disk in partitions based on columns in the data.
+
+```{r, write_dataset}
+write_dataset(airquality, "airquality_partitioned", partitioning = c("Month", "Day"))
+list.files("airquality_partitioned")
+```
+```{r, test_write_dataset, opts.label = "test"}
+test_that("write_dataset chunk works as expected", {
+  # Partition by month
+  expect_identical(list.files("airquality_partitioned"), c("Month=5", "Month=6", "Month=7", "Month=8", "Month=9"))
+  # We have enough files
+  expect_equal(length(list.files("airquality_partitioned", recursive = TRUE)), 153)
+})
+```
+As you can see, this has created folders based on the first partition variable supplied, `Month`.
+
+If you take a look in one of these folders, you will see that the data is then partitioned by the second partition variable, `Day`.
+
+```{r}
+list.files("airquality_partitioned/Month=5")
+```
+
+Each of these folders contains 1 or more Parquet files containing the relevant partition of the data.
+
+```{r}
+list.files("airquality_partitioned/Month=5/Day=10")
+```
+
+### Reading Partitioned Data
+
+You can use `open_dataset()` to read partitioned data.
+
+```{r, open_dataset}
+# Read data from directory
+air_data <- open_dataset("airquality_partitioned")
+
+# View data
+air_data
+```
+```{r, test_open_dataset, opts.label = "test"}
+test_that("open_dataset chunk works as expected", {
+  expect_equal(nrow(air_data), 153)
+  expect_equal(arrange(collect(air_data), Month, Day), arrange(airquality, Month, Day), ignore_attr = TRUE)
+})
+```
+
+
diff --git a/r/content/unpublished/configure_arrow.Rmd b/r/content/unpublished/configure_arrow.Rmd
new file mode 100644
index 0000000..348e2d6
--- /dev/null
+++ b/r/content/unpublished/configure_arrow.Rmd
@@ -0,0 +1,53 @@
+# Configure Arrow
+
+## Get config information and check which components are available
+
+```{r, arrow_info}
+arrow_info()
+```
+
+
+## Control how many CPUs are being used
+
+```{r, cpu_count}
+cpu_count()
+```
+```{r, set_cpu_count, eval = FALSE}
+set_cpu_count(4)
+```
+
+## Control IO Thread count
+
+```{r, io_thread_count}
+io_thread_count()
+```
+
+```{r, set_io_thread_count, eval = FALSE}
+set_io_thread_count(2)
+```
+
+## Switch from the CRAN version to the development version of arrow
+
+```{r, cran_to_dev, eval = FALSE}
+install_arrow(nightly = TRUE)
+```
+
+
+## Switch from the development version to CRAN version of arrow
+
+```{r, dev_to_cran, eval = FALSE}
+install_arrow()
+```
+
+## Install compression libraries
+
+```{r}
+codec_is_available("lzo")
+```
+
+
+## Install the Arrow R package using the system Arrow installation
+
+```{r, install_system, eval = FALSE}
+install_arrow(use_system = TRUE)
+```
diff --git a/r/content/unpublished/create_arrow_objects_from_r.Rmd b/r/content/unpublished/create_arrow_objects_from_r.Rmd
new file mode 100644
index 0000000..dacd9c3
--- /dev/null
+++ b/r/content/unpublished/create_arrow_objects_from_r.Rmd
@@ -0,0 +1,9 @@
+## Create an Arrow table from an R object
+
+## Arrays
+
+## ChunkedArrays
+
+## Scalars
+
+## RecordBatches
diff --git a/r/content/unpublished/manipulate_data.Rmd b/r/content/unpublished/manipulate_data.Rmd
new file mode 100644
index 0000000..0d1c7fb
--- /dev/null
+++ b/r/content/unpublished/manipulate_data.Rmd
@@ -0,0 +1,34 @@
+# Manipulate Data
+
+
+
+## Manipulate and analyze Arrow data with dplyr verbs 
+## Using simple mathematical and statistical function 
+## Work with character data (stringr functions and Arrow functions)
+## Work with datetime data (lubridate functions)
+
+### Extracting date components
+
+If you want to extract individual components from a date, you can use the following functions that mimic the behaviour of the equivalent `lubridate` functions:
+
+* `year`
+* `isoyear`
+* `quarter`
+* `month`
+* `day`
+* `wday`
+* `yday`
+* `isoweek`
+* `hour`
+* `minute`
+* `second`
+
+```{r, extract_week}
+
+```
+
+## Call an Arrow compute function which doesn't yet have an R binding
+## Access and manipulate Arrow objects through low-level bindings to the C++ library
+
+
+
diff --git a/r/content/unpublished/specify_data_types_and_schemas.Rmd b/r/content/unpublished/specify_data_types_and_schemas.Rmd
new file mode 100644
index 0000000..82ca577
--- /dev/null
+++ b/r/content/unpublished/specify_data_types_and_schemas.Rmd
@@ -0,0 +1,10 @@
+# Specify data types and schemas 
+(intro - why this is important, i.e. Exercise fine control over column types for seamless interoperability with databases and data warehouse systems)
+
+## Data types
+
+## Create a schema
+
+## Read a schema
+
+## Combine and harmonize schemas
diff --git a/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd b/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd
new file mode 100644
index 0000000..0366e66
--- /dev/null
+++ b/r/content/unpublished/work_with_arrow_in_both_python_and_r.Rmd
@@ -0,0 +1,7 @@
+# Work with Arrow in both Python and R
+
+## Install pyarrow (released version)
+
+## Install pyarrow (development version)
+
+## Share data between R and Python  (reticulate)
diff --git a/r/content/unpublished/work_with_compressed_or_partitioned_data.Rmd b/r/content/unpublished/work_with_compressed_or_partitioned_data.Rmd
new file mode 100644
index 0000000..b94c9bb
--- /dev/null
+++ b/r/content/unpublished/work_with_compressed_or_partitioned_data.Rmd
@@ -0,0 +1,5 @@
+# Work with Compressed or Partitioned Data
+
+## Read and write compressed data
+
+## Read and write partitioned data
diff --git a/r/content/unpublished/work_with_data_in_different_formats.Rmd b/r/content/unpublished/work_with_data_in_different_formats.Rmd
new file mode 100644
index 0000000..365da43
--- /dev/null
+++ b/r/content/unpublished/work_with_data_in_different_formats.Rmd
@@ -0,0 +1,29 @@
+# Work with data in different formats
+
+
+## Read and write Feather or Arrow IPC files
+
+## Read and writing streaming IPC files
+
+
+
+## Read and write Parquet files
+## Read and write CSV (and other delimited files) and JSON files
+## Read and write multi-file, larger-than-memory datasets
+## Read and write memory-mapped files
+
+```{r}
+mmap_create("mmap.arrow", 100)
+```
+```{r}
+mmap_open("mmap.arrow", mode = "write")
+```
+
+## Send and receive data over a network using an Arrow Flight RPC server
+
+```{r, include = FALSE}
+# cleanup
+unlink("mtcars.parquet")
+unlink("mtcars.feather")
+```
+
diff --git a/r/scripts/install_dependencies.R b/r/scripts/install_dependencies.R
new file mode 100644
index 0000000..a6224cf
--- /dev/null
+++ b/r/scripts/install_dependencies.R
@@ -0,0 +1,34 @@
+args <- commandArgs(trailingOnly = TRUE)
+
+# get arguments used to run this script
+if (length(args) == 0) {
+  build_version = "latest"
+} else {
+  build_version <- package_version(args[1])
+}
+
+# get installed version of a package
+get_installed_version <- function(pkg){
+  tryCatch(
+    packageVersion(pkg),
+    error = function(e) {
+      return(structure(list(c(0L, 0L, 0L)), class = c("package_version", "numeric_version")))
+    }
+  )
+}
+
+# install dependencies if not installed
+if (!require("pacman")) install.packages("pacman")
+pacman::p_load("testthat", "bookdown", "xfun", "knitr", "purrr", "remotes", "dplyr")
+pacman::p_load_gh("rmflight/testrmd")
+
+# check version of Arrow installed, and install correct one
+if (!inherits(build_version, "package_version") && build_version == "latest") {
+  install.packages("arrow", repos = c("https://arrow-r-nightly.s3.amazonaws.com", getOption("repos")))
+} else {
+  installed_version <- get_installed_version("arrow")
+  if (installed_version != build_version) {
+    pkg_url <- paste0("https://cran.r-project.org/src/contrib/Archive/arrow/arrow_", build_version, ".tar.gz")
+    install.packages(pkg_url, repos = NULL, type = "source")
+  }
+}
diff --git a/r/scripts/test.R b/r/scripts/test.R
new file mode 100644
index 0000000..a0f4725
--- /dev/null
+++ b/r/scripts/test.R
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#' Extract R code from .Rmd files
+#'
+#' Extracts all the R code and tests from .Rmd files and puts them in separate files
+#' within directory `dir`.  In order to preserve line numbering in testthat tests,
+#' all markdown chunks are also extracted as comments.
+#'
+#' @param file Path to .Rmd file
+#' @param dir Directory in which to put extracted R files
+extract_r_code <- function(file, dir){
+  bn <- basename(file)
+  fn <- strsplit(bn, ".Rmd")
+  # prefix index file with "setup" so is run first as it may contain dependencies
+  if (startsWith(bn, "index")) {
+    prefix <- "setup"
+  } else {
+    prefix <- "test"
+  }
+
+  outpath <- file.path(dir, paste0(prefix, "-", fn, ".R"))
+  knitr::purl(
+    input = file,
+    output = outpath,
+    # If we output text chunks as comments, the line number where the error is
+    # reported in the tests should match with the correct line in the Rmd
+    documentation = 2L,
+    quiet = TRUE
+  )
+
+}
+
+# get all files
+files <- list.files("./content", full.names = TRUE, pattern = "*.Rmd")
+
+# set up a temporary directory to work with
+td <- tempfile()
+on.exit(unlink(td))
+dir.create(td)
+
+# Extract R code from files
+purrr::walk(files, extract_r_code, dir = td)
+
+# Run tests
+testthat::test_dir(path = td)