You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2018/08/05 20:10:04 UTC

[arrow] branch master updated: ARROW-2869: [Python] Add documentation for Array.to_numpy

This is an automated email from the ASF dual-hosted git repository.

wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new f8ba33d  ARROW-2869: [Python] Add documentation for Array.to_numpy
f8ba33d is described below

commit f8ba33d6711b2d995d7438ede0cd384c6bcb9494
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Sun Aug 5 16:09:58 2018 -0400

    ARROW-2869: [Python] Add documentation for Array.to_numpy
    
    Author: Antoine Pitrou <an...@python.org>
    
    Closes #2351 from pitrou/ARROW-2869-document-numpy and squashes the following commits:
    
    2792dc84 <Antoine Pitrou> Fix renamed reference
    8cb89989 <Antoine Pitrou> Revert "Capitalize Pandas"
    34d8c36e <Antoine Pitrou> Capitalize Pandas
    395231e0 <Antoine Pitrou> Address review comments
    347ca4e7 <Antoine Pitrou> ARROW-2869:  Add documentation for Array.to_numpy
---
 python/doc/Makefile             |  2 +-
 python/doc/source/api.rst       |  4 +--
 python/doc/source/data.rst      |  4 +--
 python/doc/source/extending.rst |  2 +-
 python/doc/source/index.rst     |  5 +--
 python/doc/source/numpy.rst     | 75 +++++++++++++++++++++++++++++++++++++++++
 python/doc/source/pandas.rst    | 16 ++++++---
 python/doc/source/plasma.rst    |  2 +-
 python/pyarrow/array.pxi        | 17 ++++++----
 9 files changed, 106 insertions(+), 21 deletions(-)

diff --git a/python/doc/Makefile b/python/doc/Makefile
index eacb124..5798f27 100644
--- a/python/doc/Makefile
+++ b/python/doc/Makefile
@@ -20,7 +20,7 @@
 #
 
 # You can set these variables from the command line.
-SPHINXOPTS    = -j4
+SPHINXOPTS    = -j8 -W
 SPHINXBUILD   = sphinx-build
 PAPER         =
 BUILDDIR      = _build
diff --git a/python/doc/source/api.rst b/python/doc/source/api.rst
index cb99933..23eae92 100644
--- a/python/doc/source/api.rst
+++ b/python/doc/source/api.rst
@@ -139,7 +139,7 @@ Scalar Value Types
 
 .. _api.array:
 
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
 
 Array Types
 -----------
@@ -299,7 +299,7 @@ Memory Pools
 
 .. _api.type_classes:
 
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
 
 Type Classes
 ------------
diff --git a/python/doc/source/data.rst b/python/doc/source/data.rst
index 3f4169c..f54cba1 100644
--- a/python/doc/source/data.rst
+++ b/python/doc/source/data.rst
@@ -401,8 +401,8 @@ for one or more arrays of the same type.
    c.data.num_chunks
    c.data.chunk(0)
 
-As you'll see in the :ref:`pandas section <pandas>`, we can convert these
-objects to contiguous NumPy arrays for use in pandas:
+As you'll see in the :ref:`pandas section <pandas_interop>`, we can convert
+these objects to contiguous NumPy arrays for use in pandas:
 
 .. ipython:: python
 
diff --git a/python/doc/source/extending.rst b/python/doc/source/extending.rst
index a471fb3..e3d8707 100644
--- a/python/doc/source/extending.rst
+++ b/python/doc/source/extending.rst
@@ -15,7 +15,7 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
 .. _extending:
 
 Using pyarrow from C++ and Cython Code
diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst
index c35f20b..8af795d 100644
--- a/python/doc/source/index.rst
+++ b/python/doc/source/index.rst
@@ -15,8 +15,8 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-Apache Arrow (Python)
-=====================
+Python bindings for Apache Arrow
+================================
 
 Apache Arrow is a cross-language development platform for in-memory data. It
 specifies a standardized language-independent columnar memory format for flat
@@ -45,6 +45,7 @@ structures.
    ipc
    filesystems
    plasma
+   numpy
    pandas
    parquet
    extending
diff --git a/python/doc/source/numpy.rst b/python/doc/source/numpy.rst
new file mode 100644
index 0000000..303e182
--- /dev/null
+++ b/python/doc/source/numpy.rst
@@ -0,0 +1,75 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _numpy_interop:
+
+Using PyArrow with NumPy
+========================
+
+PyArrow allows converting back and forth from
+`NumPy <https://www.numpy.org/>`_ arrays to Arrow :ref:`Arrays <data.array>`.
+
+NumPy to Arrow
+--------------
+
+To convert a NumPy array to Arrow, one can simply call the :func:`pyarrow.array`
+factory function.
+
+.. code-block:: pycon
+
+   >>> import numpy as np
+   >>> import pyarrow as pa
+   >>> data = np.arange(10, dtype='int16')
+   >>> arr = pa.array(data)
+   >>> arr
+   <pyarrow.lib.Int16Array object at 0x7fb1d1e6ae58>
+   [
+     0,
+     1,
+     2,
+     3,
+     4,
+     5,
+     6,
+     7,
+     8,
+     9
+   ]
+
+Converting from NumPy supports a wide range of input dtypes, including
+structured dtypes or strings.
+
+Arrow to NumPy
+--------------
+
+In the reverse direction, it is possible to produce a view of an Arrow Array
+for use with NumPy using the :meth:`~pyarrow.Array.to_numpy` method.
+This is limited to primitive types for which NumPy has the same physical
+representation as Arrow, and assuming the Arrow data has no nulls.
+
+.. code-block:: pycon
+
+   >>> import numpy as np
+   >>> import pyarrow as pa
+   >>> arr = pa.array([4, 5, 6], type=pa.int32())
+   >>> view = arr.to_numpy()
+   >>> view
+   array([4, 5, 6], dtype=int32)
+
+For more complex data types, you have to use the :meth:`~pyarrow.Array.to_pandas`
+method (which will construct a Numpy array with Pandas semantics for, e.g.,
+representation of null values).
diff --git a/python/doc/source/pandas.rst b/python/doc/source/pandas.rst
index 7699b13..be11b5b 100644
--- a/python/doc/source/pandas.rst
+++ b/python/doc/source/pandas.rst
@@ -15,24 +15,30 @@
 .. specific language governing permissions and limitations
 .. under the License.
 
-.. _pandas:
+.. _pandas_interop:
 
 Using PyArrow with pandas
 =========================
 
-To interface with pandas, PyArrow provides various conversion routines to
-consume pandas structures and convert back to them.
+To interface with `pandas <https://pandas.pydata.org/>`_, PyArrow provides
+various conversion routines to consume pandas structures and convert back
+to them.
+
+.. note::
+   While pandas uses NumPy as a backend, it has enough peculiarities
+   (such as a different type system, and support for null values) that this
+   is a separate topic from :ref:`numpy_interop`.
 
 DataFrames
 ----------
 
-The equivalent to a pandas DataFrame in Arrow is a :class:`pyarrow.table.Table`.
+The equivalent to a pandas DataFrame in Arrow is a :ref:`Table <data.table>`.
 Both consist of a set of named columns of equal length. While pandas only
 supports flat columns, the Table also provides nested columns, thus it can
 represent more data than a DataFrame, so a full conversion is not always possible.
 
 Conversion from a Table to a DataFrame is done by calling
-:meth:`pyarrow.table.Table.to_pandas`. The inverse is then achieved by using
+:meth:`pyarrow.Table.to_pandas`. The inverse is then achieved by using
 :meth:`pyarrow.Table.from_pandas`.
 
 .. code-block:: python
diff --git a/python/doc/source/plasma.rst b/python/doc/source/plasma.rst
index b64b4c2..6adc470 100644
--- a/python/doc/source/plasma.rst
+++ b/python/doc/source/plasma.rst
@@ -291,7 +291,7 @@ process of storing an object in the Plasma store, however one cannot directly
 write the ``DataFrame`` to Plasma with Pandas alone. Plasma also needs to know
 the size of the ``DataFrame`` to allocate a buffer for.
 
-See :ref:`pandas` for more information on using Arrow with Pandas.
+See :ref:`pandas_interop` for more information on using Arrow with Pandas.
 
 You can create the pyarrow equivalent of a Pandas ``DataFrame`` by using
 ``pyarrow.from_pandas`` to convert it to a ``RecordBatch``.
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 513fa86..5906965 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -620,7 +620,7 @@ cdef class Array:
                   c_bool zero_copy_only=False,
                   c_bool integer_object_nulls=False):
         """
-        Convert to an array object suitable for use in pandas
+        Convert to a NumPy array object suitable for use in pandas.
 
         Parameters
         ----------
@@ -659,14 +659,13 @@ cdef class Array:
 
     def to_numpy(self):
         """
-        EXPERIMENTAL: Construct a NumPy view of this array. Only supports
-        primitive arrays with the same memory layout as NumPy (i.e. integers,
-        floating point) without any nulls.
+        Experimental: return a NumPy view of this array. Only primitive
+        arrays with the same memory layout as NumPy (i.e. integers,
+        floating point), without any nulls, are supported.
 
         Returns
         -------
-        arr : numpy.ndarray
-
+        array : numpy.ndarray
         """
         if self.null_count:
             raise NotImplementedError('NumPy array view is only supported '
@@ -681,7 +680,11 @@ cdef class Array:
 
     def to_pylist(self):
         """
-        Convert to an list of native Python objects.
+        Convert to a list of native Python objects.
+
+        Returns
+        -------
+        lst : list
         """
         return [x.as_py() for x in self]