You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2018/08/14 03:37:56 UTC
[arrow] branch master updated: ARROW-3047: [C++/Python] Better
build instructions with ORC
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new 7031a86 ARROW-3047: [C++/Python] Better build instructions with ORC
7031a86 is described below
commit 7031a8682e9e5a791f8d622c4188ae73b27028bb
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Mon Aug 13 23:37:51 2018 -0400
ARROW-3047: [C++/Python] Better build instructions with ORC
Also fix some markup and update a couple points.
Author: Antoine Pitrou <an...@python.org>
Closes #2427 from pitrou/ARROW-3047-orc-build-instructions and squashes the following commits:
c4d77493 <Antoine Pitrou> ARROW-3047: Better build instructions with ORC
---
python/doc/source/development.rst | 77 ++++++++++++++++++++++-----------------
1 file changed, 43 insertions(+), 34 deletions(-)
diff --git a/python/doc/source/development.rst b/python/doc/source/development.rst
index 3e1e72b..0d6ac00 100644
--- a/python/doc/source/development.rst
+++ b/python/doc/source/development.rst
@@ -65,7 +65,6 @@ First, let's clone the Arrow and Parquet git repositories:
You should now see
-
.. code-block:: shell
$ ls -l
@@ -87,7 +86,6 @@ from conda-forge:
gflags brotli jemalloc lz4-c zstd -c conda-forge
source activate pyarrow-dev
-
We need to set some environment variables to let Arrow's build system know
about our build toolchain:
@@ -141,8 +139,8 @@ folder as the repositories and a target installation folder:
# development
mkdir dist
-If your cmake version is too old on Linux, you could get a newer one via ``pip
-install cmake``.
+If your cmake version is too old on Linux, you could get a newer one via
+``pip install cmake``.
We need to set some environment variables to let Arrow's build system know
about our build toolchain:
@@ -178,9 +176,6 @@ Now build and install the Arrow C++ libraries:
If you don't want to build and install the Plasma in-memory object store,
you can omit the ``-DARROW_PLASMA=on`` flag.
-To add support for the experimental Apache ORC integration, include
-``-DARROW_ORC=on`` in these flags.
-
Now, optionally build and install the Apache Parquet libraries in your
toolchain:
@@ -211,9 +206,6 @@ Now, build pyarrow:
If you did not build parquet-cpp, you can omit ``--with-parquet`` and if
you did not build with plasma, you can omit ``--with-plasma``.
-If you built with the experimental Apache ORC integration, include
-``--with-orc`` in these flags.
-
You should be able to run the unit tests with:
.. code-block:: shell
@@ -222,24 +214,19 @@ You should be able to run the unit tests with:
================================ test session starts ====================
platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /home/wesm/arrow-clone/python, inifile:
- collected 198 items
- pyarrow/tests/test_array.py ...........
- pyarrow/tests/test_convert_builtin.py .....................
- pyarrow/tests/test_convert_pandas.py .............................
- pyarrow/tests/test_feather.py ..........................
- pyarrow/tests/test_hdfs.py sssssssssssssss
- pyarrow/tests/test_io.py ..................
- pyarrow/tests/test_ipc.py ........
- pyarrow/tests/test_parquet.py ....................
- pyarrow/tests/test_scalars.py ..........
- pyarrow/tests/test_schema.py .........
- pyarrow/tests/test_table.py .............
- pyarrow/tests/test_tensor.py ................
+ collected 1061 items / 1 skipped
+
+ [... test output not shown here ...]
- ====================== 181 passed, 17 skipped in 0.98 seconds ===========
+ ============================== warnings summary ===============================
-To build a self-contained wheel (include Arrow C++ and Parquet C++), one can set `--bundle-arrow-cpp`:
+ [... many warnings not shown here ...]
+
+ ====== 1000 passed, 56 skipped, 6 xfailed, 19 warnings in 26.52 seconds =======
+
+To build a self-contained wheel (including Arrow C++ and Parquet C++), one
+can set ``--bundle-arrow-cpp``:
.. code-block:: shell
@@ -249,6 +236,30 @@ To build a self-contained wheel (include Arrow C++ and Parquet C++), one can set
Again, if you did not build parquet-cpp, you should omit ``--with-parquet`` and
if you did not build with plasma, you should omit ``--with-plasma``.
+Building with optional ORC integration
+--------------------------------------
+
+To build Arrow with support for the `Apache ORC file format <https://orc.apache.org/>`_,
+we recommend the following:
+
+#. Install the ORC C++ libraries and tools using ``conda``:
+
+ .. code-block:: shell
+
+ conda install -c conda-forge orc
+
+#. Set ``ORC_HOME`` and ``PROTOBUF_HOME`` to the location of the installed
+ Orc and protobuf C++ libraries, respectively (otherwise Arrow will try
+ to download source versions of those libraries and recompile them):
+
+ .. code-block:: shell
+
+ export ORC_HOME=$CONDA_PREFIX
+ export PROTOBUF_HOME=$CONDA_PREFIX
+
+#. Add ``-DARROW_ORC=on`` to the CMake flags.
+#. Add ``--with-orc`` to the ``setup.py`` flags.
+
Known issues
------------
@@ -338,21 +349,20 @@ Then run the unit tests with:
py.test pyarrow -v
-Running C++ unit tests with Python
-----------------------------------
+Running C++ unit tests for Python integration
+---------------------------------------------
Getting ``python-test.exe`` to run is a bit tricky because your
-``%PYTHONPATH%`` must be configured given the active conda environment:
+``%PYTHONHOME%`` must be configured to point to the active conda environment:
.. code-block:: shell
- set CONDA_ENV=C:\Users\wesm\Miniconda\envs\arrow-test
- set PYTHONPATH=%CONDA_ENV%\Lib;%CONDA_ENV%\Lib\site-packages;%CONDA_ENV%\python35.zip;%CONDA_ENV%\DLLs;%CONDA_ENV%
+ set PYTHONHOME=%CONDA_PREFIX%
Now ``python-test.exe`` or simply ``ctest`` (to run all tests) should work.
-Nightly Builds of `arrow-cpp`, `parquet-cpp`, and `pyarrow` for Linux
----------------------------------------------------------------------
+Nightly Builds of ``arrow-cpp``, ``parquet-cpp``, and ``pyarrow`` for Linux
+---------------------------------------------------------------------------
Nightly builds of Linux conda packages for ``arrow-cpp``, ``parquet-cpp``, and
``pyarrow`` can be automated using an open source tool called `scourge
@@ -370,8 +380,7 @@ To setup your own nightly builds:
#. Run that script as a cronjob once per day
First, clone and install scourge (you also need to `install docker
-<https://docs.docker.com/engine/installation>`):
-
+<https://docs.docker.com/engine/installation>`_):
.. code:: sh