You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/04/15 14:18:00 UTC
[jira] [Commented] (ARROW-12399) Unable to load libhdfs
[ https://issues.apache.org/jira/browse/ARROW-12399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17322209#comment-17322209 ]
Joris Van den Bossche commented on ARROW-12399:
-----------------------------------------------
Could you try with {{pyarrow.fs.HadoopFileSystem(host='localhost', port=9001)}} instead? (the {{hdfs.connect()}} method is deprecated in favor of {{pyarrow.fs.HadoopFileSystem}}, which is also backed by a somewhat different implementation)
> Unable to load libhdfs
> ----------------------
>
> Key: ARROW-12399
> URL: https://issues.apache.org/jira/browse/ARROW-12399
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 3.0.0
> Reporter: Sukesh Pabolu
> Priority: Major
> Fix For: 3.0.0
>
>
> I am using pyarrow 3.0.0 with python 3.7 and hadoop 2.10.1 on windows 10 64bit. Facing this following error.
> I am using pyspark 3.1.1. I am not able to save dataframe to hdfs. When I used pyspark 3.0.0 I was able to save dataframe hdfs.
> *please help:*
> *import pyarrow as pa*
> *fs = pa.hdfs.connect(host='localhost', port=9001)*
> __main__:1: DeprecationWarning: pyarrow.hdfs.connect is deprecated as of 2.0.0, please use pyarrow.fs.HadoopFileSystem instead.
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 219, in connect
> extra_conf=extra_conf
> File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 229, in _connect
> extra_conf=extra_conf)
> File "C:\Users\1570513\Anaconda3\envs\on-premise-latest\lib\site-packages\pyarrow\hdfs.py", line 45, in __init__
> self._connect(host, port, user, kerb_ticket, extra_conf)
> File "pyarrow\io-hdfs.pxi", line 75, in pyarrow.lib.HadoopFileSystem._connect
> File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
> OSError: Unable to load libhdfs: The specified module could not be found.
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)