You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Pavel Dourugyan (Jira)" <ji...@apache.org> on 2020/05/31 12:42:00 UTC
[jira] [Created] (ARROW-8988) Help! After upgrade pyarrow from 0.15
to 0.17.1 connect to hdfs don`t work with libdfs jni
Pavel Dourugyan created ARROW-8988:
--------------------------------------
Summary: Help! After upgrade pyarrow from 0.15 to 0.17.1 connect to hdfs don`t work with libdfs jni
Key: ARROW-8988
URL: https://issues.apache.org/jira/browse/ARROW-8988
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.17.1
Reporter: Pavel Dourugyan
Attachments: 1.txt, 2.txt
h2. Problem
After upgrade pyarrow from 0.15 to 0.17, I have a some troubles. I understand, that libhdfs3 no support now. However, in my case, libhdfs not work too. See below.
My experience in the Hadoop ecosystem is not big. Maybe, I took a some wrongs. I installed Hortonworks HDP from Ambari service on the virtual machine, installed on my PC.
I try that..
1. just connect..
%xmode Verbose
import pyarrow as pa
hdfs = pa.hdfs.connect(host='hdp.test.com', port=8020, user='hdfs')
---
FileNotFoundError: [Errno 2] No such file or directory: 'hadoop': 'hadoop' ([#1.txt])
2. to bypass if driver == 'libhdfs'..
%xmode Verbose
import pyarrow as pa
hdfs = pa.HadoopFileSystem(host='hdp.test.com', port=8020, user='hdfs', driver=None')
---
OSError: Unable to load libjvm: /usr/java/latest//lib/server/libjvm.so: cannot open shared object file: No such file or directory ([#2.txt])
3. With libhdfs3 it working:
import hdfs3
hdfs = hdfs3.HDFileSystem(host='hdp.test.com', port=8020, user='hdfs')
#ls remote folder
hdfs.ls('/data/', detail=False)
['/data/TimeSheet.2020-04-11', '/data/test', '/data/test.json']
h2. Environment.
h4. +Client PC:+
OS: Debian 10. Dev.: Anaconda3 (python 3.7.6), Jupyter Lab 2, pyarrow 0.17.1 (from conda-forge)
+Hadoop+ (on VM – Oracle VirtualBox):
OS: Oracle Linux 7.6. Distr.: Hortonworks HDP 3.1.4
libhdfs.so:
[root@hdp /]# find / -name libhdfs.so
/usr/lib/ams-hbase/lib/hadoop-native/libhdfs.so
/usr/hdp/3.1.4.0-315/usr/lib/libhdfs.so
Java path:
[root@hdp /]# sudo alternatives --config java
-----------------------------------------------
*+ 1 java-1.8.0-openjdk.x86_64 (/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/bin/java)
libjvm:
[root@hdp /]# find / -name libjvm.*
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.252.b09-2.el7_8.x86_64/jre/lib/amd64/server/libjvm.so
/usr/jdk64/jdk1.8.0_112/jre/lib/amd64/server/libjvm.so
I tried many settings (. Below last :
# etc/profile.
...
export JAVA_HOME=$(dirname $(dirname $(readlink $(readlink $(which javac)))))
export JRE_HOME=$JAVA_HOME/jre
export JAVA_CLASSPATH=$JAVA_HOME/jre/lib:$JAVA_HOME/lib:$JAVA_HOME/lib/tools.jar
export HADOOP_HOME=/usr/hdp/3.1.4.0-315/hadoop
export HADOOP_CLASSPATH=$(find $HADOOP_HOME -name '*.jar' | xargs echo | tr ' ' ':')
export ARROW_LIBHDFS_DIR=/usr/lib/ams-hbase/lib/hadoop-native
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export CLASSPATH==.:$CLASSPATH:$JAVA_CLASSPATH:$HADOOP_CLASSPATH
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JRE_HOME/lib/amd64/server
--
This message was sent by Atlassian Jira
(v8.3.4#803005)