You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/10 15:34:00 UTC

[jira] [Commented] (ARROW-3957) [Python] pyarrow.hdfs.connect fails silently

    [ https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16714931#comment-16714931 ] 

Wes McKinney commented on ARROW-3957:
-------------------------------------

Well, this is a bit problematic because libhdfs is supposed to return NULL when the connection fails:

https://github.com/apache/arrow/blob/master/cpp/thirdparty/hadoop/include/hdfs.h#L232

You can see we check for null here

https://github.com/apache/arrow/blob/master/cpp/src/arrow/io/hdfs.cc#L346

I'm not sure what we can do here if libhdfs is failing to work as advertised. Can you open an issue with Apache Hadoop about this?

> [Python] pyarrow.hdfs.connect fails silently
> --------------------------------------------
>
>                 Key: ARROW-3957
>                 URL: https://issues.apache.org/jira/browse/ARROW-3957
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>         Environment: centos 7
>            Reporter: Jim Fulton
>            Priority: Major
>              Labels: hdfs
>
> I'm trying to connect to HDFS using libhdfs and Kerberos.
> I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly.
> My connect call looks like:
> {{import pyarrow.hdfs}}
> {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}}
> {{                         user='ME', kerb_ticket="/tmp/krb5cc_498970")}}
> This doesn't error but the resulting connection can't do anything. They either error like this:
> {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}
> Or swallow errors (e.g. {{exists}} returning {{False}}).
> Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors.
> Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.).
> Any help would be appreciated greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)