You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by we...@apache.org on 2017/04/09 17:54:15 UTC

arrow git commit: ARROW-762: [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS

Repository: arrow
Updated Branches:
  refs/heads/master b0e3122b9 -> 739ed8202


ARROW-762: [Python] Start docs page about files and filesystems, adapt C++ docs about HDFS

Author: Wes McKinney <we...@twosigma.com>

Closes #511 from wesm/ARROW-762 and squashes the following commits:

273142e [Wes McKinney] Add initial docs about configuring environment to use pyarrow.HdfsClient


Project: http://git-wip-us.apache.org/repos/asf/arrow/repo
Commit: http://git-wip-us.apache.org/repos/asf/arrow/commit/739ed820
Tree: http://git-wip-us.apache.org/repos/asf/arrow/tree/739ed820
Diff: http://git-wip-us.apache.org/repos/asf/arrow/diff/739ed820

Branch: refs/heads/master
Commit: 739ed82028e9efae43f00f4e19b39737adb8a348
Parents: b0e3122
Author: Wes McKinney <we...@twosigma.com>
Authored: Sun Apr 9 13:54:10 2017 -0400
Committer: Wes McKinney <we...@twosigma.com>
Committed: Sun Apr 9 13:54:10 2017 -0400

----------------------------------------------------------------------
 python/doc/filesystems.rst | 58 +++++++++++++++++++++++++++++++++++++++++
 python/doc/index.rst       | 12 ++++-----
 2 files changed, 64 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/arrow/blob/739ed820/python/doc/filesystems.rst
----------------------------------------------------------------------
diff --git a/python/doc/filesystems.rst b/python/doc/filesystems.rst
new file mode 100644
index 0000000..9e00ddd
--- /dev/null
+++ b/python/doc/filesystems.rst
@@ -0,0 +1,58 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+
+..   http://www.apache.org/licenses/LICENSE-2.0
+
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+File interfaces and Memory Maps
+===============================
+
+PyArrow features a number of file-like interfaces
+
+Hadoop File System (HDFS)
+-------------------------
+
+PyArrow comes with bindings to a C++-based interface to the Hadoop File
+System. You connect like so:
+
+.. code-block:: python
+
+   import pyarrow as pa
+   hdfs = pa.HdfsClient(host, port, user=user, kerb_ticket=ticket_cache_path)
+
+By default, ``pyarrow.HdfsClient`` uses libhdfs, a JNI-based interface to the
+Java Hadoop client. This library is loaded **at runtime** (rather than at link
+/ library load time, since the library may not be in your LD_LIBRARY_PATH), and
+relies on some environment variables.
+
+* ``HADOOP_HOME``: the root of your installed Hadoop distribution. Often has
+  `lib/native/libhdfs.so`.
+
+* ``JAVA_HOME``: the location of your Java SDK installation.
+
+* ``ARROW_LIBHDFS_DIR`` (optional): explicit location of ``libhdfs.so`` if it is
+  installed somewhere other than ``$HADOOP_HOME/lib/native``.
+
+* ``CLASSPATH``: must contain the Hadoop jars. You can set these using:
+
+.. code-block:: shell
+
+    export CLASSPATH=`$HADOOP_HOME/bin/hdfs classpath --glob`
+
+You can also use libhdfs3, a thirdparty C++ library for HDFS from Pivotal Labs:
+
+.. code-block:: python
+
+   hdfs3 = pa.HdfsClient(host, port, user=user, kerb_ticket=ticket_cache_path,
+                         driver='libhdfs3')

http://git-wip-us.apache.org/repos/asf/arrow/blob/739ed820/python/doc/index.rst
----------------------------------------------------------------------
diff --git a/python/doc/index.rst b/python/doc/index.rst
index d64354b..608fff5 100644
--- a/python/doc/index.rst
+++ b/python/doc/index.rst
@@ -34,15 +34,15 @@ structures.
    :maxdepth: 2
    :caption: Getting Started
 
-   Installing pyarrow <install.rst>
-   Pandas <pandas.rst>
-   Module Reference <modules.rst>
-   Getting Involved <getting_involved.rst>
+   install
+   pandas
+   filesystems
+   parquet
+   modules
+   getting_involved
 
 .. toctree::
    :maxdepth: 2
    :caption: Additional Features
 
-   Parquet format <parquet.rst>
    jemalloc MemoryPool <jemalloc.rst>
-