You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Eric Henry (Jira)" <ji...@apache.org> on 2020/03/18 20:44:00 UTC

[jira] [Created] (ARROW-8154) HDFS Filesystem does not set environment variables in pyarrow 0.16.0 release

Eric Henry created ARROW-8154:
---------------------------------

             Summary:  HDFS Filesystem does not set environment variables in  pyarrow 0.16.0 release
                 Key: ARROW-8154
                 URL: https://issues.apache.org/jira/browse/ARROW-8154
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.16.0
            Reporter: Eric Henry


In pyarrow 0.15.x, HDFS filesystem works as follows:

If you set HADOOP_HOME env var, it looks for libhdfs.so in $HADOOP_HOME/lib/native.

In pyarrow 0.16.x, if you set HADOOP_HOME, it looks for libhdfs.so in $HADOOP_HOME, which is incorrect behaviour on all systems I am using.

Also, CLASSPATH no longer gets set automatically, which is very convenient. The issue here is that I need to set hadoop home correctly to be able to use other libraries, but have to reset it to use apache arrow. e.g.

os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

..do stuff here..

...then connect to arrow...

os.environ["HADOOP_HOME"] = "/usr/lib/hadoop/lib/native"

hdfs = pyarrow.hdfs.connect(host, port)

...then reset my hadoop home...

os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

etc.

 

Example:

>>> os.environ["HADOOP_HOME"] = "/usr/lib/hadoop"

>>> hdfs = pyarrow.hdfs.connect(host, port)

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 215, in connect

    extra_conf=extra_conf)

  File "/home/user/.conda/envs/retroscoring/lib/python3.6/site-packages/pyarrow/hdfs.py", line 40, in __init__

    self._connect(host, port, user, kerb_ticket, driver, extra_conf)

  File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect

  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status

OSError: Unable to load libhdfs: /usr/lib/hadoop/libhdfs.so: cannot open shared object file: No such file or directory

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)