You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2017/04/24 19:58:24 UTC
arrow git commit: ARROW-862: [Python] Simplify README landing
documentation to direct users and developers toward the documentation
Repository: arrow
Updated Branches:
refs/heads/master 76d56d3aa -> 6239abd1a
ARROW-862: [Python] Simplify README landing documentation to direct users and developers toward the documentation
Also migrates DEVELOPMENT.md to the Sphinx docs
Author: Wes McKinney <we...@twosigma.com>
Closes #584 from wesm/ARROW-862 and squashes the following commits:
50049dd [Wes McKinney] Revise python/README.md. Move DEVELOPMENT.md to Sphinx docs. Other cleaning
2187c1c [Wes McKinney] Migrate DEVELOPMENT.md to sphinx docs
Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/6239abd1
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/6239abd1
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/6239abd1
Branch: refs/heads/master
Commit: 6239abd1a61fc254818548a7b6ee3f8a88777a7f
Parents: 76d56d3
Author: Wes McKinney <we...@twosigma.com>
Authored: Mon Apr 24 15:58:19 2017 -0400
Committer: Wes McKinney <we...@twosigma.com>
Committed: Mon Apr 24 15:58:19 2017 -0400
----------------------------------------------------------------------
python/DEVELOPMENT.md | 207 -------------------------------
python/README.md | 71 ++---------
python/doc/source/development.rst | 215 +++++++++++++++++++++++++++++++++
python/doc/source/index.rst | 1 +
python/doc/source/install.rst | 117 ++----------------
5 files changed, 236 insertions(+), 375 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/DEVELOPMENT.md
----------------------------------------------------------------------
diff --git a/python/DEVELOPMENT.md b/python/DEVELOPMENT.md
deleted file mode 100644
index 7f08169..0000000
--- a/python/DEVELOPMENT.md
+++ /dev/null
@@ -1,207 +0,0 @@
-<!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
--->
-
-## Developer guide for conda users
-
-### Linux and macOS
-
-#### System Requirements
-
-On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is
-sufficient.
-
-On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or
-higher. You can check your version by running
-
-```shell
-$ gcc --version
-```
-
-On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:
-
-```shell
-$ sudo apt-get install g++-4.9
-```
-
-Finally, set gcc 4.9 as the active compiler using:
-
-```shell
-export CC=gcc-4.9
-export CXX=g++-4.9
-```
-
-#### Environment Setup and Build
-
-First, let's create a conda environment with all the C++ build and Python
-dependencies from conda-forge:
-
-```shell
-conda create -y -q -n pyarrow-dev \
- python=3.6 numpy six setuptools cython pandas pytest \
- cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
- brotli jemalloc -c conda-forge
-source activate pyarrow-dev
-```
-
-Now, let's clone the Arrow and Parquet git repositories:
-
-```shell
-mkdir repos
-cd repos
-git clone https://github.com/apache/arrow.git
-git clone https://github.com/apache/parquet-cpp.git
-```
-
-You should now see
-
-```shell
-$ ls -l
-total 8
-drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
-drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/
-```
-
-We need to set a number of environment variables to let Arrow's build system
-know about our build toolchain:
-
-```
-export ARROW_BUILD_TYPE=release
-
-export BOOST_ROOT=$CONDA_PREFIX
-export BOOST_LIBRARYDIR=$CONDA_PREFIX/lib
-
-export FLATBUFFERS_HOME=$CONDA_PREFIX
-export RAPIDJSON_HOME=$CONDA_PREFIX
-export THRIFT_HOME=$CONDA_PREFIX
-export ZLIB_HOME=$CONDA_PREFIX
-export SNAPPY_HOME=$CONDA_PREFIX
-export BROTLI_HOME=$CONDA_PREFIX
-export JEMALLOC_HOME=$CONDA_PREFIX
-export ARROW_HOME=$CONDA_PREFIX
-export PARQUET_HOME=$CONDA_PREFIX
-```
-
-Now build and install the Arrow C++ libraries:
-
-```shell
-mkdir arrow/cpp/build
-pushd arrow/cpp/build
-
-cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
- -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
- -DARROW_PYTHON=on \
- -DARROW_BUILD_TESTS=OFF \
- ..
-make -j4
-make install
-popd
-```
-
-Now build and install the Apache Parquet libraries in your toolchain:
-
-```shell
-mkdir parquet-cpp/build
-pushd parquet-cpp/build
-
-cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
- -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
- -DPARQUET_BUILD_BENCHMARKS=off \
- -DPARQUET_BUILD_EXECUTABLES=off \
- -DPARQUET_ZLIB_VENDORED=off \
- -DPARQUET_BUILD_TESTS=off \
- ..
-
-make -j4
-make install
-popd
-```
-
-Now, build pyarrow:
-
-```shell
-cd arrow/python
-python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace
-```
-
-You should be able to run the unit tests with:
-
-```shell
-$ py.test pyarrow
-================================ test session starts ================================
-platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
-rootdir: /home/wesm/arrow-clone/python, inifile:
-collected 198 items
-
-pyarrow/tests/test_array.py ...........
-pyarrow/tests/test_convert_builtin.py .....................
-pyarrow/tests/test_convert_pandas.py .............................
-pyarrow/tests/test_feather.py ..........................
-pyarrow/tests/test_hdfs.py sssssssssssssss
-pyarrow/tests/test_io.py ..................
-pyarrow/tests/test_ipc.py ........
-pyarrow/tests/test_jemalloc.py ss
-pyarrow/tests/test_parquet.py ....................
-pyarrow/tests/test_scalars.py ..........
-pyarrow/tests/test_schema.py .........
-pyarrow/tests/test_table.py .............
-pyarrow/tests/test_tensor.py ................
-
-====================== 181 passed, 17 skipped in 0.98 seconds =======================
-```
-
-### Windows
-
-First, make sure you can [build the C++ library][1].
-
-Now, we need to build and install the C++ libraries someplace.
-
-```shell
-mkdir cpp\build
-cd cpp\build
-set ARROW_HOME=C:\thirdparty
-cmake -G "Visual Studio 14 2015 Win64" ^
- -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
- -DCMAKE_BUILD_TYPE=Release ^
- -DARROW_BUILD_TESTS=off ^
- -DARROW_PYTHON=on ..
-cmake --build . --target INSTALL --config Release
-cd ..\..
-```
-
-After that, we must put the install directory's bin path in our `%PATH%`:
-
-```shell
-set PATH=%ARROW_HOME%\bin;%PATH%
-```
-
-Now, we can build pyarrow:
-
-```shell
-cd python
-python setup.py build_ext --inplace
-```
-
-#### Running C++ unit tests with Python
-
-Getting `python-test.exe` to run is a bit tricky because your `%PYTHONPATH%`
-must be configured given the active conda environment:
-
-```shell
-set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test
-set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV%
-```
-
-Now `python-test.exe` or simply `ctest` (to run all tests) should work.
-
-[1]: https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/README.md
----------------------------------------------------------------------
diff --git a/python/README.md b/python/README.md
index ed008ea..816fbf0 100644
--- a/python/README.md
+++ b/python/README.md
@@ -18,78 +18,31 @@ This library provides a Pythonic API wrapper for the reference Arrow C++
implementation, along with tools for interoperability with pandas, NumPy, and
other traditional Python scientific computing packages.
-### Development details
-
-This project is layered in two pieces:
-
-* arrow_python, a library part of the main Arrow C++ project for Python,
- pandas, and NumPy interoperability
-* Cython extensions and pure Python code under pyarrow/ which expose Arrow C++
- and pyarrow to pure Python users
+## Installing
-#### PyArrow Dependencies:
-
-To build pyarrow, first build and install Arrow C++ with the Python component
-enabled using `-DARROW_PYTHON=on`, see
-(https://github.com/apache/arrow/blob/master/cpp/README.md) . These components
-must be installed either in the default system location (e.g. `/usr/local`) or
-in a custom `$ARROW_HOME` location.
+Across platforms, you can install a recent version of pyarrow with the conda
+package manager:
```shell
-mkdir cpp/build
-pushd cpp/build
-cmake -DARROW_PYTHON=on -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
-make -j4
-make install
-```
-
-If you build with a custom `CMAKE_INSTALL_PREFIX`, during development, you must
-set `ARROW_HOME` as an environment variable and add it to your
-`LD_LIBRARY_PATH` on Linux and OS X:
-
-```bash
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_HOME/lib
-```
-
-5. **Python dependencies: numpy, pandas, cython, pytest**
-
-#### Build pyarrow and run the unit tests
-
-```bash
-python setup.py build_ext --inplace
-py.test pyarrow
-```
-
-To change the build type, use the `--build-type` option or set
-`$PYARROW_BUILD_TYPE`:
-
-```bash
-python setup.py build_ext --build-type=release --inplace
+conda install pyarrow -c conda-forge
```
-To pass through other build options to CMake, set the environment variable
-`$PYARROW_CMAKE_OPTIONS`.
-
-#### Build the pyarrow Parquet file extension
+On Linux, you can also install binary wheels from PyPI with pip:
-To build the integration with [parquet-cpp][1], pass `--with-parquet` to
-the `build_ext` option in setup.py:
-
-```
-python setup.py build_ext --with-parquet install
+```shell
+pip install pyarrow
```
-Alternately, add `-DPYARROW_BUILD_PARQUET=on` to the general CMake options.
+### Development details
-```
-export PYARROW_CMAKE_OPTIONS=-DPYARROW_BUILD_PARQUET=on
-```
+See the [Development][2] page in the documentation.
-#### Build the documentation
+### Building the documentation
```bash
pip install -r doc/requirements.txt
python setup.py build_sphinx -s doc/source
```
-[1]: https://github.com/apache/parquet-cpp
\ No newline at end of file
+[1]: https://github.com/apache/parquet-cpp
+[2]: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/development.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/development.rst b/python/doc/source/development.rst
new file mode 100644
index 0000000..01add11
--- /dev/null
+++ b/python/doc/source/development.rst
@@ -0,0 +1,215 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+.. _development:
+
+***********
+Development
+***********
+
+Developing with conda
+=====================
+
+Linux and macOS
+---------------
+
+System Requirements
+~~~~~~~~~~~~~~~~~~~
+
+On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is
+sufficient.
+
+On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or
+higher. You can check your version by running
+
+.. code-block:: shell
+
+ $ gcc --version
+
+On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:
+
+.. code-block:: shell
+
+ $ sudo apt-get install g++-4.9
+
+Finally, set gcc 4.9 as the active compiler using:
+
+.. code-block:: shell
+
+ export CC=gcc-4.9
+ export CXX=g++-4.9
+
+Environment Setup and Build
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First, let's create a conda environment with all the C++ build and Python
+dependencies from conda-forge:
+
+.. code-block:: shell
+
+ conda create -y -q -n pyarrow-dev \
+ python=3.6 numpy six setuptools cython pandas pytest \
+ cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
+ brotli jemalloc -c conda-forge
+ source activate pyarrow-dev
+
+Now, let's clone the Arrow and Parquet git repositories:
+
+.. code-block:: shell
+
+ mkdir repos
+ cd repos
+ git clone https://github.com/apache/arrow.git
+ git clone https://github.com/apache/parquet-cpp.git
+
+You should now see
+
+
+.. code-block:: shell
+
+ $ ls -l
+ total 8
+ drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
+ drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/
+
+We need to set some environment variables to let Arrow's build system know
+about our build toolchain:
+
+.. code-block:: shell
+
+ export ARROW_BUILD_TYPE=release
+ export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
+ export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX
+
+Now build and install the Arrow C++ libraries:
+
+.. code-block:: shell
+
+ mkdir arrow/cpp/build
+ pushd arrow/cpp/build
+
+ cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
+ -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
+ -DARROW_PYTHON=on \
+ -DARROW_BUILD_TESTS=OFF \
+ ..
+ make -j4
+ make install
+ popd
+
+Now, optionally build and install the Apache Parquet libraries in your
+toolchain:
+
+.. code-block:: shell
+
+ mkdir parquet-cpp/build
+ pushd parquet-cpp/build
+
+ cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
+ -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
+ -DPARQUET_BUILD_BENCHMARKS=off \
+ -DPARQUET_BUILD_EXECUTABLES=off \
+ -DPARQUET_ZLIB_VENDORED=off \
+ -DPARQUET_BUILD_TESTS=off \
+ ..
+
+ make -j4
+ make install
+ popd
+
+Now, build pyarrow:
+
+.. code-block:: shell
+
+ cd arrow/python
+ python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
+ --with-parquet --with-jemalloc --inplace
+
+If you did not build parquet-cpp, you can omit ``--with-parquet``.
+
+You should be able to run the unit tests with:
+
+.. code-block:: shell
+
+ $ py.test pyarrow
+ ================================ test session starts ====================
+ platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
+ rootdir: /home/wesm/arrow-clone/python, inifile:
+ collected 198 items
+
+ pyarrow/tests/test_array.py ...........
+ pyarrow/tests/test_convert_builtin.py .....................
+ pyarrow/tests/test_convert_pandas.py .............................
+ pyarrow/tests/test_feather.py ..........................
+ pyarrow/tests/test_hdfs.py sssssssssssssss
+ pyarrow/tests/test_io.py ..................
+ pyarrow/tests/test_ipc.py ........
+ pyarrow/tests/test_jemalloc.py ss
+ pyarrow/tests/test_parquet.py ....................
+ pyarrow/tests/test_scalars.py ..........
+ pyarrow/tests/test_schema.py .........
+ pyarrow/tests/test_table.py .............
+ pyarrow/tests/test_tensor.py ................
+
+ ====================== 181 passed, 17 skipped in 0.98 seconds ===========
+
+Windows
+=======
+
+First, make sure you can `build the C++ library <https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md>`_.
+
+Now, we need to build and install the C++ libraries someplace.
+
+.. code-block:: shell
+
+ mkdir cpp\build
+ cd cpp\build
+ set ARROW_HOME=C:\thirdparty
+ cmake -G "Visual Studio 14 2015 Win64" ^
+ -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
+ -DCMAKE_BUILD_TYPE=Release ^
+ -DARROW_BUILD_TESTS=off ^
+ -DARROW_PYTHON=on ..
+ cmake --build . --target INSTALL --config Release
+ cd ..\..
+
+After that, we must put the install directory's bin path in our ``%PATH%``:
+
+.. code-block:: shell
+
+ set PATH=%ARROW_HOME%\bin;%PATH%
+
+Now, we can build pyarrow:
+
+.. code-block:: shell
+
+ cd python
+ python setup.py build_ext --inplace
+
+Running C++ unit tests with Python
+----------------------------------
+
+Getting ``python-test.exe`` to run is a bit tricky because your
+``%PYTHONPATH%`` must be configured given the active conda environment:
+
+.. code-block:: shell
+
+ set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test
+ set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV%
+
+Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work.
http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/index.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst
index ecb8e8f..55b4efc 100644
--- a/python/doc/source/index.rst
+++ b/python/doc/source/index.rst
@@ -35,6 +35,7 @@ structures.
:caption: Getting Started
install
+ development
pandas
filesystems
parquet
http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/install.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/install.rst b/python/doc/source/install.rst
index 278b466..a2a6520 100644
--- a/python/doc/source/install.rst
+++ b/python/doc/source/install.rst
@@ -37,115 +37,14 @@ Install the latest version from PyPI:
pip install pyarrow
.. note::
- Currently there are only binary artifcats available for Linux and MacOS.
- Otherwise this will only pull the python sources and assumes an existing
- installation of the C++ part of Arrow.
- To retrieve the binary artifacts, you'll need a recent ``pip`` version that
- supports features like the ``manylinux1`` tag.
-
-Building from source
---------------------
-
-First, clone the master git repository:
-
-.. code-block:: bash
-
- git clone https://github.com/apache/arrow.git arrow
-
-System requirements
-~~~~~~~~~~~~~~~~~~~
-
-Building pyarrow requires:
-
-* A C++11 compiler
-
- * Linux: gcc >= 4.8 or clang >= 3.5
- * OS X: XCode 6.4 or higher preferred
-
-* `CMake <https://cmake.org/>`_
-
-Python requirements
-~~~~~~~~~~~~~~~~~~~
-
-You will need Python (CPython) 2.7, 3.4, or 3.5 installed. Earlier releases and
-are not being targeted.
-
-.. note::
- This library targets CPython only due to an emphasis on interoperability with
- pandas and NumPy, which are only available for CPython.
-
-The build requires NumPy, Cython, and a few other Python dependencies:
-
-.. code-block:: bash
-
- pip install cython
- cd arrow/python
- pip install -r requirements.txt
-
-Installing Arrow C++ library
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-First, you should choose an installation location for Arrow C++. In the future
-using the default system install location will work, but for now we are being
-explicit:
-
-.. code-block:: bash
-
- export ARROW_HOME=$HOME/local
-
-Now, we build Arrow:
-
-.. code-block:: bash
-
- cd arrow/cpp
-
- mkdir dev-build
- cd dev-build
-
- cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
-
- make
-
- # Use sudo here if $ARROW_HOME requires it
- make install
-
-To get the optional Parquet support, you should also build and install
-`parquet-cpp <https://github.com/apache/parquet-cpp/blob/master/README.md>`_.
-Install `pyarrow`
-~~~~~~~~~~~~~~~~~
-
-
-.. code-block:: bash
-
- cd arrow/python
-
- # --with-parquet enables the Apache Parquet support in PyArrow
- # --with-jemalloc enables the jemalloc allocator support in PyArrow
- # --build-type=release disables debugging information and turns on
- # compiler optimizations for native code
- python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install
- python setup.py install
-
-.. warning::
- On XCode 6 and prior there are some known OS X `@rpath` issues. If you are
- unable to import pyarrow, upgrading XCode may be the solution.
-
-.. note::
- In development installations, you will also need to set a correct
- ``LD_LIBRARY_PATH``. This is most probably done with
- ``export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH``.
-
-
-.. code-block:: python
+ Currently there are only binary artifacts available for Linux and MacOS.
+ Otherwise this will only pull the python sources and assumes an existing
+ installation of the C++ part of Arrow. To retrieve the binary artifacts,
+ you'll need a recent ``pip`` version that supports features like the
+ ``manylinux1`` tag.
- In [1]: import pyarrow
+Installing from source
+----------------------
- In [2]: pyarrow.array([1,2,3])
- Out[2]:
- <pyarrow.array.Int64Array object at 0x7f899f3e60e8>
- [
- 1,
- 2,
- 3
- ]
+See :ref:`development`.