You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2017/04/24 19:58:24 UTC

arrow git commit: ARROW-862: [Python] Simplify README landing documentation to direct users and developers toward the documentation

Repository: arrow
Updated Branches:
  refs/heads/master 76d56d3aa -> 6239abd1a


ARROW-862: [Python] Simplify README landing documentation to direct users and developers toward the documentation

Also migrates DEVELOPMENT.md to the Sphinx docs

Author: Wes McKinney <we...@twosigma.com>

Closes #584 from wesm/ARROW-862 and squashes the following commits:

50049dd [Wes McKinney] Revise python/README.md. Move DEVELOPMENT.md to Sphinx docs. Other cleaning
2187c1c [Wes McKinney] Migrate DEVELOPMENT.md to sphinx docs


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/6239abd1
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/6239abd1
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/6239abd1

Branch: refs/heads/master
Commit: 6239abd1a61fc254818548a7b6ee3f8a88777a7f
Parents: 76d56d3
Author: Wes McKinney <we...@twosigma.com>
Authored: Mon Apr 24 15:58:19 2017 -0400
Committer: Wes McKinney <we...@twosigma.com>
Committed: Mon Apr 24 15:58:19 2017 -0400

----------------------------------------------------------------------
 python/DEVELOPMENT.md             | 207 -------------------------------
 python/README.md                  |  71 ++---------
 python/doc/source/development.rst | 215 +++++++++++++++++++++++++++++++++
 python/doc/source/index.rst       |   1 +
 python/doc/source/install.rst     | 117 ++----------------
 5 files changed, 236 insertions(+), 375 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/DEVELOPMENT.md
----------------------------------------------------------------------
diff --git a/python/DEVELOPMENT.md b/python/DEVELOPMENT.md
deleted file mode 100644
index 7f08169..0000000
--- a/python/DEVELOPMENT.md
+++ /dev/null
@@ -1,207 +0,0 @@
-<!---
-  Licensed under the Apache License, Version 2.0 (the "License");
-  you may not use this file except in compliance with the License.
-  You may obtain a copy of the License at
-
-   http://www.apache.org/licenses/LICENSE-2.0
-
-  Unless required by applicable law or agreed to in writing, software
-  distributed under the License is distributed on an "AS IS" BASIS,
-  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-  See the License for the specific language governing permissions and
-  limitations under the License. See accompanying LICENSE file.
--->
-
-## Developer guide for conda users
-
-### Linux and macOS
-
-#### System Requirements
-
-On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is
-sufficient.
-
-On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or
-higher. You can check your version by running
-
-```shell
-$ gcc --version
-```
-
-On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:
-
-```shell
-$ sudo apt-get install g++-4.9
-```
-
-Finally, set gcc 4.9 as the active compiler using:
-
-```shell
-export CC=gcc-4.9
-export CXX=g++-4.9
-```
-
-#### Environment Setup and Build
-
-First, let's create a conda environment with all the C++ build and Python
-dependencies from conda-forge:
-
-```shell
-conda create -y -q -n pyarrow-dev \
-      python=3.6 numpy six setuptools cython pandas pytest \
-      cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
-      brotli jemalloc -c conda-forge
-source activate pyarrow-dev
-```
-
-Now, let's clone the Arrow and Parquet git repositories:
-
-```shell
-mkdir repos
-cd repos
-git clone https://github.com/apache/arrow.git
-git clone https://github.com/apache/parquet-cpp.git
-```
-
-You should now see
-
-```shell
-$ ls -l
-total 8
-drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
-drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/
-```
-
-We need to set a number of environment variables to let Arrow's build system
-know about our build toolchain:
-
-```
-export ARROW_BUILD_TYPE=release
-
-export BOOST_ROOT=$CONDA_PREFIX
-export BOOST_LIBRARYDIR=$CONDA_PREFIX/lib
-
-export FLATBUFFERS_HOME=$CONDA_PREFIX
-export RAPIDJSON_HOME=$CONDA_PREFIX
-export THRIFT_HOME=$CONDA_PREFIX
-export ZLIB_HOME=$CONDA_PREFIX
-export SNAPPY_HOME=$CONDA_PREFIX
-export BROTLI_HOME=$CONDA_PREFIX
-export JEMALLOC_HOME=$CONDA_PREFIX
-export ARROW_HOME=$CONDA_PREFIX
-export PARQUET_HOME=$CONDA_PREFIX
-```
-
-Now build and install the Arrow C++ libraries:
-
-```shell
-mkdir arrow/cpp/build
-pushd arrow/cpp/build
-
-cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-      -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-      -DARROW_PYTHON=on \
-      -DARROW_BUILD_TESTS=OFF \
-      ..
-make -j4
-make install
-popd
-```
-
-Now build and install the Apache Parquet libraries in your toolchain:
-
-```shell
-mkdir parquet-cpp/build
-pushd parquet-cpp/build
-
-cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
-      -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
-      -DPARQUET_BUILD_BENCHMARKS=off \
-      -DPARQUET_BUILD_EXECUTABLES=off \
-      -DPARQUET_ZLIB_VENDORED=off \
-      -DPARQUET_BUILD_TESTS=off \
-      ..
-
-make -j4
-make install
-popd
-```
-
-Now, build pyarrow:
-
-```shell
-cd arrow/python
-python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --inplace
-```
-
-You should be able to run the unit tests with:
-
-```shell
-$ py.test pyarrow
-================================ test session starts ================================
-platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
-rootdir: /home/wesm/arrow-clone/python, inifile:
-collected 198 items
-
-pyarrow/tests/test_array.py ...........
-pyarrow/tests/test_convert_builtin.py .....................
-pyarrow/tests/test_convert_pandas.py .............................
-pyarrow/tests/test_feather.py ..........................
-pyarrow/tests/test_hdfs.py sssssssssssssss
-pyarrow/tests/test_io.py ..................
-pyarrow/tests/test_ipc.py ........
-pyarrow/tests/test_jemalloc.py ss
-pyarrow/tests/test_parquet.py ....................
-pyarrow/tests/test_scalars.py ..........
-pyarrow/tests/test_schema.py .........
-pyarrow/tests/test_table.py .............
-pyarrow/tests/test_tensor.py ................
-
-====================== 181 passed, 17 skipped in 0.98 seconds =======================
-```
-
-### Windows
-
-First, make sure you can [build the C++ library][1].
-
-Now, we need to build and install the C++ libraries someplace.
-
-```shell
-mkdir cpp\build
-cd cpp\build
-set ARROW_HOME=C:\thirdparty
-cmake -G "Visual Studio 14 2015 Win64" ^
-      -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
-      -DCMAKE_BUILD_TYPE=Release ^
-      -DARROW_BUILD_TESTS=off ^
-      -DARROW_PYTHON=on ..
-cmake --build . --target INSTALL --config Release
-cd ..\..
-```
-
-After that, we must put the install directory's bin path in our `%PATH%`:
-
-```shell
-set PATH=%ARROW_HOME%\bin;%PATH%
-```
-
-Now, we can build pyarrow:
-
-```shell
-cd python
-python setup.py build_ext --inplace
-```
-
-#### Running C++ unit tests with Python
-
-Getting `python-test.exe` to run is a bit tricky because your `%PYTHONPATH%`
-must be configured given the active conda environment:
-
-```shell
-set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test
-set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV%
-```
-
-Now `python-test.exe` or simply `ctest` (to run all tests) should work.
-
-[1]: https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/README.md
----------------------------------------------------------------------
diff --git a/python/README.md b/python/README.md
index ed008ea..816fbf0 100644
--- a/python/README.md
+++ b/python/README.md
@@ -18,78 +18,31 @@ This library provides a Pythonic API wrapper for the reference Arrow C++
 implementation, along with tools for interoperability with pandas, NumPy, and
 other traditional Python scientific computing packages.
 
-### Development details
-
-This project is layered in two pieces:
-
-* arrow_python, a library part of the main Arrow C++ project for Python,
-  pandas, and NumPy interoperability
-* Cython extensions and pure Python code under pyarrow/ which expose Arrow C++
-  and pyarrow to pure Python users
+## Installing
 
-#### PyArrow Dependencies:
-
-To build pyarrow, first build and install Arrow C++ with the Python component
-enabled using `-DARROW_PYTHON=on`, see
-(https://github.com/apache/arrow/blob/master/cpp/README.md) . These components
-must be installed either in the default system location (e.g. `/usr/local`) or
-in a custom `$ARROW_HOME` location.
+Across platforms, you can install a recent version of pyarrow with the conda
+package manager:
 
 ```shell
-mkdir cpp/build
-pushd cpp/build
-cmake -DARROW_PYTHON=on -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
-make -j4
-make install
-```
-
-If you build with a custom `CMAKE_INSTALL_PREFIX`, during development, you must
-set `ARROW_HOME` as an environment variable and add it to your
-`LD_LIBRARY_PATH` on Linux and OS X:
-
-```bash
-export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_HOME/lib
-```
-
-5. **Python dependencies: numpy, pandas, cython, pytest**
-
-#### Build pyarrow and run the unit tests
-
-```bash
-python setup.py build_ext --inplace
-py.test pyarrow
-```
-
-To change the build type, use the `--build-type` option or set
-`$PYARROW_BUILD_TYPE`:
-
-```bash
-python setup.py build_ext --build-type=release --inplace
+conda install pyarrow -c conda-forge
 ```
 
-To pass through other build options to CMake, set the environment variable
-`$PYARROW_CMAKE_OPTIONS`.
-
-#### Build the pyarrow Parquet file extension
+On Linux, you can also install binary wheels from PyPI with pip:
 
-To build the integration with [parquet-cpp][1], pass `--with-parquet` to
-the `build_ext` option in setup.py:
-
-```
-python setup.py build_ext --with-parquet install
+```shell
+pip install pyarrow
 ```
 
-Alternately, add `-DPYARROW_BUILD_PARQUET=on` to the general CMake options.
+### Development details
 
-```
-export PYARROW_CMAKE_OPTIONS=-DPYARROW_BUILD_PARQUET=on
-```
+See the [Development][2] page in the documentation.
 
-#### Build the documentation
+### Building the documentation
 
 ```bash
 pip install -r doc/requirements.txt
 python setup.py build_sphinx -s doc/source
 ```
 
-[1]: https://github.com/apache/parquet-cpp
\ No newline at end of file
+[1]: https://github.com/apache/parquet-cpp
+[2]: https://github.com/apache/arrow/blob/master/python/doc/source/development.rst
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/development.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/development.rst b/python/doc/source/development.rst
new file mode 100644
index 0000000..01add11
--- /dev/null
+++ b/python/doc/source/development.rst
@@ -0,0 +1,215 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. currentmodule:: pyarrow
+.. _development:
+
+***********
+Development
+***********
+
+Developing with conda
+=====================
+
+Linux and macOS
+---------------
+
+System Requirements
+~~~~~~~~~~~~~~~~~~~
+
+On macOS, any modern XCode (6.4 or higher; the current version is 8.3.1) is
+sufficient.
+
+On Linux, for this guide, we recommend using gcc 4.8 or 4.9, or clang 3.7 or
+higher. You can check your version by running
+
+.. code-block:: shell
+
+   $ gcc --version
+
+On Ubuntu 16.04 and higher, you can obtain gcc 4.9 with:
+
+.. code-block:: shell
+
+   $ sudo apt-get install g++-4.9
+
+Finally, set gcc 4.9 as the active compiler using:
+
+.. code-block:: shell
+
+   export CC=gcc-4.9
+   export CXX=g++-4.9
+
+Environment Setup and Build
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+First, let's create a conda environment with all the C++ build and Python
+dependencies from conda-forge:
+
+.. code-block:: shell
+
+   conda create -y -q -n pyarrow-dev \
+         python=3.6 numpy six setuptools cython pandas pytest \
+         cmake flatbuffers rapidjson boost-cpp thrift-cpp snappy zlib \
+         brotli jemalloc -c conda-forge
+   source activate pyarrow-dev
+
+Now, let's clone the Arrow and Parquet git repositories:
+
+.. code-block:: shell
+
+   mkdir repos
+   cd repos
+   git clone https://github.com/apache/arrow.git
+   git clone https://github.com/apache/parquet-cpp.git
+
+You should now see
+
+
+.. code-block:: shell
+
+   $ ls -l
+   total 8
+   drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 arrow/
+   drwxrwxr-x 12 wesm wesm 4096 Apr 15 19:19 parquet-cpp/
+
+We need to set some environment variables to let Arrow's build system know
+about our build toolchain:
+
+.. code-block:: shell
+
+   export ARROW_BUILD_TYPE=release
+   export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
+   export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX
+
+Now build and install the Arrow C++ libraries:
+
+.. code-block:: shell
+
+   mkdir arrow/cpp/build
+   pushd arrow/cpp/build
+
+   cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
+         -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
+         -DARROW_PYTHON=on \
+         -DARROW_BUILD_TESTS=OFF \
+         ..
+   make -j4
+   make install
+   popd
+
+Now, optionally build and install the Apache Parquet libraries in your
+toolchain:
+
+.. code-block:: shell
+
+   mkdir parquet-cpp/build
+   pushd parquet-cpp/build
+
+   cmake -DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE \
+         -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX \
+         -DPARQUET_BUILD_BENCHMARKS=off \
+         -DPARQUET_BUILD_EXECUTABLES=off \
+         -DPARQUET_ZLIB_VENDORED=off \
+         -DPARQUET_BUILD_TESTS=off \
+         ..
+
+   make -j4
+   make install
+   popd
+
+Now, build pyarrow:
+
+.. code-block:: shell
+
+   cd arrow/python
+   python setup.py build_ext --build-type=$ARROW_BUILD_TYPE \
+          --with-parquet --with-jemalloc --inplace
+
+If you did not build parquet-cpp, you can omit ``--with-parquet``.
+
+You should be able to run the unit tests with:
+
+.. code-block:: shell
+
+   $ py.test pyarrow
+   ================================ test session starts ====================
+   platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
+   rootdir: /home/wesm/arrow-clone/python, inifile:
+   collected 198 items
+
+   pyarrow/tests/test_array.py ...........
+   pyarrow/tests/test_convert_builtin.py .....................
+   pyarrow/tests/test_convert_pandas.py .............................
+   pyarrow/tests/test_feather.py ..........................
+   pyarrow/tests/test_hdfs.py sssssssssssssss
+   pyarrow/tests/test_io.py ..................
+   pyarrow/tests/test_ipc.py ........
+   pyarrow/tests/test_jemalloc.py ss
+   pyarrow/tests/test_parquet.py ....................
+   pyarrow/tests/test_scalars.py ..........
+   pyarrow/tests/test_schema.py .........
+   pyarrow/tests/test_table.py .............
+   pyarrow/tests/test_tensor.py ................
+
+   ====================== 181 passed, 17 skipped in 0.98 seconds ===========
+
+Windows
+=======
+
+First, make sure you can `build the C++ library <https://github.com/apache/arrow/blob/master/cpp/doc/Windows.md>`_.
+
+Now, we need to build and install the C++ libraries someplace.
+
+.. code-block:: shell
+
+   mkdir cpp\build
+   cd cpp\build
+   set ARROW_HOME=C:\thirdparty
+   cmake -G "Visual Studio 14 2015 Win64" ^
+         -DCMAKE_INSTALL_PREFIX=%ARROW_HOME% ^
+         -DCMAKE_BUILD_TYPE=Release ^
+         -DARROW_BUILD_TESTS=off ^
+         -DARROW_PYTHON=on ..
+   cmake --build . --target INSTALL --config Release
+   cd ..\..
+
+After that, we must put the install directory's bin path in our ``%PATH%``:
+
+.. code-block:: shell
+
+   set PATH=%ARROW_HOME%\bin;%PATH%
+
+Now, we can build pyarrow:
+
+.. code-block:: shell
+
+   cd python
+   python setup.py build_ext --inplace
+
+Running C++ unit tests with Python
+----------------------------------
+
+Getting ``python-test.exe`` to run is a bit tricky because your
+``%PYTHONPATH%`` must be configured given the active conda environment:
+
+.. code-block:: shell
+
+   set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test
+   set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV%
+
+Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work.

http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/index.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst
index ecb8e8f..55b4efc 100644
--- a/python/doc/source/index.rst
+++ b/python/doc/source/index.rst
@@ -35,6 +35,7 @@ structures.
    :caption: Getting Started
 
    install
+   development
    pandas
    filesystems
    parquet

http://git-wip-us.apache.org/repos/asf/arrow/blob/6239abd1/python/doc/source/install.rst
----------------------------------------------------------------------
diff --git a/python/doc/source/install.rst b/python/doc/source/install.rst
index 278b466..a2a6520 100644
--- a/python/doc/source/install.rst
+++ b/python/doc/source/install.rst
@@ -37,115 +37,14 @@ Install the latest version from PyPI:
     pip install pyarrow
 
 .. note::
-    Currently there are only binary artifcats available for Linux and MacOS.
-    Otherwise this will only pull the python sources and assumes an existing
-    installation of the C++ part of Arrow.
-    To retrieve the binary artifacts, you'll need a recent ``pip`` version that
-    supports features like the ``manylinux1`` tag.
-
-Building from source
---------------------
-
-First, clone the master git repository:
-
-.. code-block:: bash
-
-    git clone https://github.com/apache/arrow.git arrow
-
-System requirements
-~~~~~~~~~~~~~~~~~~~
-
-Building pyarrow requires:
-
-* A C++11 compiler
-
-  * Linux: gcc >= 4.8 or clang >= 3.5
-  * OS X: XCode 6.4 or higher preferred
-
-* `CMake <https://cmake.org/>`_
-
-Python requirements
-~~~~~~~~~~~~~~~~~~~
-
-You will need Python (CPython) 2.7, 3.4, or 3.5 installed. Earlier releases and
-are not being targeted.
-
-.. note::
-    This library targets CPython only due to an emphasis on interoperability with
-    pandas and NumPy, which are only available for CPython.
-
-The build requires NumPy, Cython, and a few other Python dependencies:
-
-.. code-block:: bash
-
-    pip install cython
-    cd arrow/python
-    pip install -r requirements.txt
-
-Installing Arrow C++ library
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-First, you should choose an installation location for Arrow C++. In the future
-using the default system install location will work, but for now we are being
-explicit:
-
-.. code-block:: bash
-
-    export ARROW_HOME=$HOME/local
-
-Now, we build Arrow:
-
-.. code-block:: bash
-
-    cd arrow/cpp
-
-    mkdir dev-build
-    cd dev-build
-
-    cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME ..
-
-    make
-
-    # Use sudo here if $ARROW_HOME requires it
-    make install
-
-To get the optional Parquet support, you should also build and install
-`parquet-cpp <https://github.com/apache/parquet-cpp/blob/master/README.md>`_.
 
-Install `pyarrow`
-~~~~~~~~~~~~~~~~~
-
-
-.. code-block:: bash
-
-    cd arrow/python
-
-    # --with-parquet enables the Apache Parquet support in PyArrow
-    # --with-jemalloc enables the jemalloc allocator support in PyArrow
-    # --build-type=release disables debugging information and turns on
-    #       compiler optimizations for native code
-    python setup.py build_ext --with-parquet --with-jemalloc --build-type=release install
-    python setup.py install
-
-.. warning::
-    On XCode 6 and prior there are some known OS X `@rpath` issues. If you are
-    unable to import pyarrow, upgrading XCode may be the solution.
-
-.. note::
-    In development installations, you will also need to set a correct
-    ``LD_LIBRARY_PATH``. This is most probably done with
-    ``export LD_LIBRARY_PATH=$ARROW_HOME/lib:$LD_LIBRARY_PATH``.
-
-
-.. code-block:: python
+    Currently there are only binary artifacts available for Linux and MacOS.
+    Otherwise this will only pull the python sources and assumes an existing
+    installation of the C++ part of Arrow.  To retrieve the binary artifacts,
+    you'll need a recent ``pip`` version that supports features like the
+    ``manylinux1`` tag.
 
-    In [1]: import pyarrow
+Installing from source
+----------------------
 
-    In [2]: pyarrow.array([1,2,3])
-    Out[2]:
-    <pyarrow.array.Int64Array object at 0x7f899f3e60e8>
-    [
-      1,
-      2,
-      3
-    ]
+See :ref:`development`.