You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Pavel Dourugyan (Jira)" <ji...@apache.org> on 2020/05/31 12:42:00 UTC

[jira] [Created] (ARROW-8988) Help! After upgrade pyarrow from 0.15 to 0.17.1 connect to hdfs don`t work with libdfs jni

Pavel Dourugyan created ARROW-8988:
--------------------------------------

             Summary: Help! After upgrade pyarrow from 0.15 to 0.17.1 connect to hdfs don`t work with libdfs jni
                 Key: ARROW-8988
                 URL: https://issues.apache.org/jira/browse/ARROW-8988
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.17.1
            Reporter: Pavel Dourugyan
         Attachments: 1.txt, 2.txt

h2. Problem

After upgrade pyarrow from 0.15 to 0.17, I have a some troubles. I understand, that libhdfs3 no support now. However, in my case, libhdfs not work too. See below.

My experience in the Hadoop ecosystem is not big. Maybe, I took a some wrongs. I installed Hortonworks HDP  from Ambari service on the virtual machine, installed on my PC.

I try that..

1.  just connect..

%xmode Verbose
import pyarrow as pa

hdfs = pa.hdfs.connect(host='hdp.test.com', port=8020, user='hdfs')

---

FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop' ([#1.txt])

2. to bypass if driver == 'libhdfs'..

%xmode Verbose

import pyarrow as pa

hdfs = pa.HadoopFileSystem(host='hdp.test.com', port=8020, user='hdfs', driver=None')

---

OSError: Unable to load libjvm: /usr/java/latest//lib/server/libjvm.so: cannot open shared object file: No such file or directory ([#2.txt])

3. With libhdfs3 it working:

import hdfs3 

hdfs = hdfs3.HDFileSystem(host='hdp.test.com', port=8020, user='hdfs')

#ls remote folder
hdfs.ls('/data/', detail=False)

['/data/TimeSheet.2020-04-11', '/data/test', '/data/test.json']
h2. Environment.
h4. +Client PC:+

OS: Debian 10. Dev.: Anaconda3 (python 3.7.6), Jupyter Lab 2, pyarrow 0.17.1 (from conda-forge)

+Hadoop+ (on VM – Oracle VirtualBox):

OS: Oracle Linux 7.6.  Distr.: Hortonworks HDP 3.1.4

libhdfs.so:

[root@hdp /]# find / -name libhdfs.so
 /usr/lib/ams-hbase/lib/hadoop-native/libhdfs.so
 /usr/hdp/3.1.4.0-315/usr/lib/libhdfs.so

 

 Java path:

[root@hdp /]# sudo alternatives --config java
 
-----------------------------------------------
 *+ 1           java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)

 

libjvm:               

[root@hdp /]# find / -name libjvm.*
 /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
 /usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so

 

I tried many settings (. Below last :

# etc/profile.
 ...
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export JRE_HOME=$JAVA_HOME/jre
export JAVA_CLASSPATH=$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
export ARROW_LIBHDFS_DIR=/usr/lib/ams-hbase/lib/hadoop-native

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH==.:$CLASSPATH:$JAVA_CLASSPATH:$HADOOP_CLASSPATH

export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JRE_HOME/lib/amd64/server

 
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)