You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Jim Fulton (JIRA)" <ji...@apache.org> on 2018/12/07 22:43:00 UTC
[jira] [Created] (ARROW-3957) pyarrow.hdfs.connect fails silently
Jim Fulton created ARROW-3957:
---------------------------------
Summary: pyarrow.hdfs.connect fails silently
Key: ARROW-3957
URL: https://issues.apache.org/jira/browse/ARROW-3957
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.11.1
Environment: centos 7
Reporter: Jim Fulton
I'm trying to connect to HDFS using libhdfs and Kerberos.
I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly.
My connect call looks like:
{{import pyarrow.hdfs c = pyarrow.hdfs.connect(host='MYHOST', port=42424, user='ME', kerb_ticket="/tmp/krb5cc_498970") }}
This doesn't error but the resulting connection can't do anything. They either error like this:
{{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}
Or swallow errors (e.g. {{exists}} returning {{False}}).
Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors.
Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.).
Any help would be appreciated greatly.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)