You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Sukesh Pabolu (Jira)" <ji...@apache.org> on 2021/04/15 12:43:00 UTC

[jira] [Created] (ARROW-12399) Unable to load libhdfs

Sukesh Pabolu created ARROW-12399:
-------------------------------------

             Summary: Unable to load libhdfs
                 Key: ARROW-12399
                 URL: https://issues.apache.org/jira/browse/ARROW-12399
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 3.0.0
            Reporter: Sukesh Pabolu
             Fix For: 3.0.0


I am using pyarrow 3.0.0 with python 3.7. Facing this following error. 

I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs.

*please help:*

*import pyarrow as pa*
*fs = pa.hdfs.connect(host='localhost', port=9001)*
__main__:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect
 extra_conf=extra_conf
 File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect
 extra_conf=extra_conf)
 File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in __init__
 self._connect(host, port, user, kerb_ticket, extra_conf)
 File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
 File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
OSError: Unable to load libhdfs: The specified module could not be found.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)