You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2018/08/05 20:10:04 UTC
[arrow] branch master updated: ARROW-2869: [Python] Add
documentation for Array.to_numpy
This is an automated email from the ASF dual-hosted git repository.
wesm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new f8ba33d ARROW-2869: [Python] Add documentation for Array.to_numpy
f8ba33d is described below
commit f8ba33d6711b2d995d7438ede0cd384c6bcb9494
Author: Antoine Pitrou <an...@python.org>
AuthorDate: Sun Aug 5 16:09:58 2018 -0400
ARROW-2869: [Python] Add documentation for Array.to_numpy
Author: Antoine Pitrou <an...@python.org>
Closes #2351 from pitrou/ARROW-2869-document-numpy and squashes the following commits:
2792dc84 <Antoine Pitrou> Fix renamed reference
8cb89989 <Antoine Pitrou> Revert "Capitalize Pandas"
34d8c36e <Antoine Pitrou> Capitalize Pandas
395231e0 <Antoine Pitrou> Address review comments
347ca4e7 <Antoine Pitrou> ARROW-2869: Add documentation for Array.to_numpy
---
python/doc/Makefile | 2 +-
python/doc/source/api.rst | 4 +--
python/doc/source/data.rst | 4 +--
python/doc/source/extending.rst | 2 +-
python/doc/source/index.rst | 5 +--
python/doc/source/numpy.rst | 75 +++++++++++++++++++++++++++++++++++++++++
python/doc/source/pandas.rst | 16 ++++++---
python/doc/source/plasma.rst | 2 +-
python/pyarrow/array.pxi | 17 ++++++----
9 files changed, 106 insertions(+), 21 deletions(-)
diff --git a/python/doc/Makefile b/python/doc/Makefile
index eacb124..5798f27 100644
--- a/python/doc/Makefile
+++ b/python/doc/Makefile
@@ -20,7 +20,7 @@
#
# You can set these variables from the command line.
-SPHINXOPTS = -j4
+SPHINXOPTS = -j8 -W
SPHINXBUILD = sphinx-build
PAPER =
BUILDDIR = _build
diff --git a/python/doc/source/api.rst b/python/doc/source/api.rst
index cb99933..23eae92 100644
--- a/python/doc/source/api.rst
+++ b/python/doc/source/api.rst
@@ -139,7 +139,7 @@ Scalar Value Types
.. _api.array:
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
Array Types
-----------
@@ -299,7 +299,7 @@ Memory Pools
.. _api.type_classes:
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
Type Classes
------------
diff --git a/python/doc/source/data.rst b/python/doc/source/data.rst
index 3f4169c..f54cba1 100644
--- a/python/doc/source/data.rst
+++ b/python/doc/source/data.rst
@@ -401,8 +401,8 @@ for one or more arrays of the same type.
c.data.num_chunks
c.data.chunk(0)
-As you'll see in the :ref:`pandas section <pandas>`, we can convert these
-objects to contiguous NumPy arrays for use in pandas:
+As you'll see in the :ref:`pandas section <pandas_interop>`, we can convert
+these objects to contiguous NumPy arrays for use in pandas:
.. ipython:: python
diff --git a/python/doc/source/extending.rst b/python/doc/source/extending.rst
index a471fb3..e3d8707 100644
--- a/python/doc/source/extending.rst
+++ b/python/doc/source/extending.rst
@@ -15,7 +15,7 @@
.. specific language governing permissions and limitations
.. under the License.
-.. currentmodule:: pyarrow.lib
+.. currentmodule:: pyarrow
.. _extending:
Using pyarrow from C++ and Cython Code
diff --git a/python/doc/source/index.rst b/python/doc/source/index.rst
index c35f20b..8af795d 100644
--- a/python/doc/source/index.rst
+++ b/python/doc/source/index.rst
@@ -15,8 +15,8 @@
.. specific language governing permissions and limitations
.. under the License.
-Apache Arrow (Python)
-=====================
+Python bindings for Apache Arrow
+================================
Apache Arrow is a cross-language development platform for in-memory data. It
specifies a standardized language-independent columnar memory format for flat
@@ -45,6 +45,7 @@ structures.
ipc
filesystems
plasma
+ numpy
pandas
parquet
extending
diff --git a/python/doc/source/numpy.rst b/python/doc/source/numpy.rst
new file mode 100644
index 0000000..303e182
--- /dev/null
+++ b/python/doc/source/numpy.rst
@@ -0,0 +1,75 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+.. _numpy_interop:
+
+Using PyArrow with NumPy
+========================
+
+PyArrow allows converting back and forth from
+`NumPy <https://www.numpy.org/>`_ arrays to Arrow :ref:`Arrays <data.array>`.
+
+NumPy to Arrow
+--------------
+
+To convert a NumPy array to Arrow, one can simply call the :func:`pyarrow.array`
+factory function.
+
+.. code-block:: pycon
+
+ >>> import numpy as np
+ >>> import pyarrow as pa
+ >>> data = np.arange(10, dtype='int16')
+ >>> arr = pa.array(data)
+ >>> arr
+ <pyarrow.lib.Int16Array object at 0x7fb1d1e6ae58>
+ [
+ 0,
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9
+ ]
+
+Converting from NumPy supports a wide range of input dtypes, including
+structured dtypes or strings.
+
+Arrow to NumPy
+--------------
+
+In the reverse direction, it is possible to produce a view of an Arrow Array
+for use with NumPy using the :meth:`~pyarrow.Array.to_numpy` method.
+This is limited to primitive types for which NumPy has the same physical
+representation as Arrow, and assuming the Arrow data has no nulls.
+
+.. code-block:: pycon
+
+ >>> import numpy as np
+ >>> import pyarrow as pa
+ >>> arr = pa.array([4, 5, 6], type=pa.int32())
+ >>> view = arr.to_numpy()
+ >>> view
+ array([4, 5, 6], dtype=int32)
+
+For more complex data types, you have to use the :meth:`~pyarrow.Array.to_pandas`
+method (which will construct a Numpy array with Pandas semantics for, e.g.,
+representation of null values).
diff --git a/python/doc/source/pandas.rst b/python/doc/source/pandas.rst
index 7699b13..be11b5b 100644
--- a/python/doc/source/pandas.rst
+++ b/python/doc/source/pandas.rst
@@ -15,24 +15,30 @@
.. specific language governing permissions and limitations
.. under the License.
-.. _pandas:
+.. _pandas_interop:
Using PyArrow with pandas
=========================
-To interface with pandas, PyArrow provides various conversion routines to
-consume pandas structures and convert back to them.
+To interface with `pandas <https://pandas.pydata.org/>`_, PyArrow provides
+various conversion routines to consume pandas structures and convert back
+to them.
+
+.. note::
+ While pandas uses NumPy as a backend, it has enough peculiarities
+ (such as a different type system, and support for null values) that this
+ is a separate topic from :ref:`numpy_interop`.
DataFrames
----------
-The equivalent to a pandas DataFrame in Arrow is a :class:`pyarrow.table.Table`.
+The equivalent to a pandas DataFrame in Arrow is a :ref:`Table <data.table>`.
Both consist of a set of named columns of equal length. While pandas only
supports flat columns, the Table also provides nested columns, thus it can
represent more data than a DataFrame, so a full conversion is not always possible.
Conversion from a Table to a DataFrame is done by calling
-:meth:`pyarrow.table.Table.to_pandas`. The inverse is then achieved by using
+:meth:`pyarrow.Table.to_pandas`. The inverse is then achieved by using
:meth:`pyarrow.Table.from_pandas`.
.. code-block:: python
diff --git a/python/doc/source/plasma.rst b/python/doc/source/plasma.rst
index b64b4c2..6adc470 100644
--- a/python/doc/source/plasma.rst
+++ b/python/doc/source/plasma.rst
@@ -291,7 +291,7 @@ process of storing an object in the Plasma store, however one cannot directly
write the ``DataFrame`` to Plasma with Pandas alone. Plasma also needs to know
the size of the ``DataFrame`` to allocate a buffer for.
-See :ref:`pandas` for more information on using Arrow with Pandas.
+See :ref:`pandas_interop` for more information on using Arrow with Pandas.
You can create the pyarrow equivalent of a Pandas ``DataFrame`` by using
``pyarrow.from_pandas`` to convert it to a ``RecordBatch``.
diff --git a/python/pyarrow/array.pxi b/python/pyarrow/array.pxi
index 513fa86..5906965 100644
--- a/python/pyarrow/array.pxi
+++ b/python/pyarrow/array.pxi
@@ -620,7 +620,7 @@ cdef class Array:
c_bool zero_copy_only=False,
c_bool integer_object_nulls=False):
"""
- Convert to an array object suitable for use in pandas
+ Convert to a NumPy array object suitable for use in pandas.
Parameters
----------
@@ -659,14 +659,13 @@ cdef class Array:
def to_numpy(self):
"""
- EXPERIMENTAL: Construct a NumPy view of this array. Only supports
- primitive arrays with the same memory layout as NumPy (i.e. integers,
- floating point) without any nulls.
+ Experimental: return a NumPy view of this array. Only primitive
+ arrays with the same memory layout as NumPy (i.e. integers,
+ floating point), without any nulls, are supported.
Returns
-------
- arr : numpy.ndarray
-
+ array : numpy.ndarray
"""
if self.null_count:
raise NotImplementedError('NumPy array view is only supported '
@@ -681,7 +680,11 @@ cdef class Array:
def to_pylist(self):
"""
- Convert to an list of native Python objects.
+ Convert to a list of native Python objects.
+
+ Returns
+ -------
+ lst : list
"""
return [x.as_py() for x in self]