You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2017/04/09 17:54:15 UTC
arrow git commit: ARROW-762: [Python] Start docs page about files and
filesystems, adapt C++ docs about HDFS
Repository: arrow
Updated Branches:
refs/heads/master b0e3122b9 -> 739ed8202
ARROW-762: [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS
Author: Wes McKinney <we...@twosigma.com>
Closes #511 from wesm/ARROW-762 and squashes the following commits:
273142e [Wes McKinney] Add initial docs about configuring environment to use pyarrow.HdfsClient
Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/739ed820
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/739ed820
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/739ed820
Branch: refs/heads/master
Commit: 739ed82028e9efae43f00f4e19b39737adb8a348
Parents: b0e3122
Author: Wes McKinney <we...@twosigma.com>
Authored: Sun Apr 9 13:54:10 2017 -0400
Committer: Wes McKinney <we...@twosigma.com>
Committed: Sun Apr 9 13:54:10 2017 -0400
----------------------------------------------------------------------
python/doc/filesystems.rst | 58 +++++++++++++++++++++++++++++++++++++++++
python/doc/index.rst | 12 ++++-----
2 files changed, 64 insertions(+), 6 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/arrow/blob/739ed820/python/doc/filesystems.rst
----------------------------------------------------------------------
diff --git a/python/doc/filesystems.rst b/python/doc/filesystems.rst
new file mode 100644
index 0000000..9e00ddd
--- /dev/null
+++ b/python/doc/filesystems.rst
@@ -0,0 +1,58 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements. See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership. The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License. You may obtain a copy of the License at
+
+.. http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied. See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+File interfaces and Memory Maps
+===============================
+
+PyArrow features a number of file-like interfaces
+
+Hadoop File System (HDFS)
+-------------------------
+
+PyArrow comes with bindings to a C++-based interface to the Hadoop File
+System. You connect like so:
+
+.. code-block:: python
+
+ import pyarrow as pa
+ hdfs = pa.HdfsClient(host, port, user=user, kerb_ticket=ticket_cache_path)
+
+By default, ``pyarrow.HdfsClient`` uses libhdfs, a JNI-based interface to the
+Java Hadoop client. This library is loaded **at runtime** (rather than at link
+/ library load time, since the library may not be in your LD_LIBRARY_PATH), and
+relies on some environment variables.
+
+* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
+ `lib/native/libhdfs.so`.
+
+* ``JAVA_HOME``: the location of your Java SDK installation.
+
+* ``ARROW_LIBHDFS_DIR`` (optional): explicit location of ``libhdfs.so`` if it is
+ installed somewhere other than ``$HADOOP_HOME/lib/native``.
+
+* ``CLASSPATH``: must contain the Hadoop jars. You can set these using:
+
+.. code-block:: shell
+
+ export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
+
+You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs:
+
+.. code-block:: python
+
+ hdfs3 = pa.HdfsClient(host, port, user=user, kerb_ticket=ticket_cache_path,
+ driver='libhdfs3')
http://git-wip-us.apache.org/repos/asf/arrow/blob/739ed820/python/doc/index.rst
----------------------------------------------------------------------
diff --git a/python/doc/index.rst b/python/doc/index.rst
index d64354b..608fff5 100644
--- a/python/doc/index.rst
+++ b/python/doc/index.rst
@@ -34,15 +34,15 @@ structures.
:maxdepth: 2
:caption: Getting Started
- Installing pyarrow <install.rst>
- Pandas <pandas.rst>
- Module Reference <modules.rst>
- Getting Involved <getting_involved.rst>
+ install
+ pandas
+ filesystems
+ parquet
+ modules
+ getting_involved
.. toctree::
:maxdepth: 2
:caption: Additional Features
- Parquet format <parquet.rst>
jemalloc MemoryPool <jemalloc.rst>
-