You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/08/02 15:06:00 UTC

[jira] [Commented] (ARROW-1316) hdfs connector stand-alone

    [ https://issues.apache.org/jira/browse/ARROW-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111075#comment-16111075 ] 

Wes McKinney commented on ARROW-1316:
-------------------------------------

I am not sure this is possible. To use libhdfs to access an HDFS cluster, you need:

* A JVM installation
* The Hadoop client libraries in your classpath
* File system-like API for the libhdfs library

These are provided respectively by the JDK install, the Hadoop install, and the Arrow libraries. The Arrow interface to HDFS provides a consistent API as other files (https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h). This is the same approach used in TensorFlow (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h) and other projects. 

> hdfs connector stand-alone
> --------------------------
>
>                 Key: ARROW-1316
>                 URL: https://issues.apache.org/jira/browse/ARROW-1316
>             Project: Apache Arrow
>          Issue Type: Wish
>            Reporter: Martin Durant
>
> Currently, access to hdfs via libhdfs requires the whole of arrow, a java installation and a hadoop installation. This setup is indeed common, such as on "cluster edge-nodes".
> This issue is posted with the wish that hdfs file-system access could be done without needing the whole set of installations, above.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)