You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Jim Fulton (JIRA)" <ji...@apache.org> on 2018/12/10 21:40:00 UTC

[jira] [Comment Edited] (ARROW-3957) [Python] pyarrow.hdfs.connect fails silently

    [ https://issues.apache.org/jira/browse/ARROW-3957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16715591#comment-16715591 ] 

Jim Fulton edited comment on ARROW-3957 at 12/10/18 9:39 PM:
-------------------------------------------------------------

A contributing factor was that I was using a Jupyter notebook, which hid some output.

 

When I ran outside of a notebook, I could see a Java traceback featuring:

{{java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length}}

 

I also tried the hdfs command-line tool and saw the same error, so I knew I was screwing up consistently. ;)

 

Eventually, I realized I was using the wrong protocol.

 

 

 


was (Author: j1m):
A contributing factor was that I was using a Jupyter notebook, which hid some output.

 

When I ran outside of a notebook, I could see a Java traceback featuring:

{{java.io.IOException: Failed on local exception: org.apache.hadoop.ipc.RpcException: RPC response exceeds maximum data length}}

 

I also tried the hdfs command-line tool and saw the same error, so I know I was screwing up consistently. ;)

 

Eventually, I realized I was using the wrong protocol.

 

 

 

> [Python] pyarrow.hdfs.connect fails silently
> --------------------------------------------
>
>                 Key: ARROW-3957
>                 URL: https://issues.apache.org/jira/browse/ARROW-3957
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.11.1
>         Environment: centos 7
>            Reporter: Jim Fulton
>            Priority: Major
>              Labels: hdfs
>
> I'm trying to connect to HDFS using libhdfs and Kerberos.
> I have JAVA_HOME and HADOOP_HOME set and {{pyarrow.hdfs.connect}} sets CLASSPATH correctly.
> My connect call looks like:
> {{import pyarrow.hdfs}}
> {{c = pyarrow.hdfs.connect(host='MYHOST', port=42424,}}
> {{                         user='ME', kerb_ticket="/tmp/krb5cc_498970")}}
> This doesn't error but the resulting connection can't do anything. They either error like this:
> {{ArrowIOError: HDFS list directory failed, errno: 255 (Unknown error 255) }}
> Or swallow errors (e.g. {{exists}} returning {{False}}).
> Note that {{connect}} errors if the host is wrong but doesn't error if the port, user, or kerb_ticket are wrong. I have no idea how to debug this, because no useful errors.
> Note that I _can_ connect using the hdfs Python package. (Of course, that doesn't provide the API I need to read Parquet files.).
> Any help would be appreciated greatly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)