You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Saurabh Bajaj (JIRA)" <ji...@apache.org> on 2019/07/12 12:27:00 UTC

[jira] [Created] (ARROW-5922) Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API

Saurabh Bajaj created ARROW-5922:
------------------------------------

             Summary: Unable to connect to HDFS from a worker/data node on a Kerberized cluster using pyarrow' hdfs API
                 Key: ARROW-5922
                 URL: https://issues.apache.org/jira/browse/ARROW-5922
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.14.0
         Environment: Unix
            Reporter: Saurabh Bajaj
             Fix For: 0.14.0


Here's what I'm trying:

{{```}}

{{import pyarrow as pa }}

{{conf = \{"hadoop.security.authentication": "kerberos"} }}

{{fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf)}}

{{```}}

However, when I submit this job to the cluster using {{Dask-YARN}}, I get the following error:

```

{{File "test/run.py", line 3 fs = pa.hdfs.connect(kerb_ticket="/tmp/krb5cc_44444", extra_conf=conf) File "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", line 211, in connect File "/opt/hadoop/data/10/hadoop/yarn/local/usercache/hdfsf6/appcache/application_1560931326013_183242/container_e47_1560931326013_183242_01_000003/environment/lib/python3.7/site-packages/pyarrow/hdfs.py", line 38, in __init__ File "pyarrow/io-hdfs.pxi", line 105, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS connection failed}}

{{```}}

I also tried setting {{host (to a name node)}} and {{port (=8020)}}, however I run into the same error. Since the error is not descriptive, I'm not sure which setting needs to be altered. Any clues anyone?



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)