You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2017/08/02 15:06:00 UTC
[jira] [Commented] (ARROW-1316) hdfs connector stand-alone
[ https://issues.apache.org/jira/browse/ARROW-1316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16111075#comment-16111075 ]
Wes McKinney commented on ARROW-1316:
-------------------------------------
I am not sure this is possible. To use libhdfs to access an HDFS cluster, you need:
* A JVM installation
* The Hadoop client libraries in your classpath
* File system-like API for the libhdfs library
These are provided respectively by the JDK install, the Hadoop install, and the Arrow libraries. The Arrow interface to HDFS provides a consistent API as other files (https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.h). This is the same approach used in TensorFlow (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h) and other projects.
> hdfs connector stand-alone
> --------------------------
>
> Key: ARROW-1316
> URL: https://issues.apache.org/jira/browse/ARROW-1316
> Project: Apache Arrow
> Issue Type: Wish
> Reporter: Martin Durant
>
> Currently, access to hdfs via libhdfs requires the whole of arrow, a java installation and a hadoop installation. This setup is indeed common, such as on "cluster edge-nodes".
> This issue is posted with the wish that hdfs file-system access could be done without needing the whole set of installations, above.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)