You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Andy (Jira)" <ji...@apache.org> on 2020/06/14 18:29:00 UTC
[jira] [Commented] (ARROW-9019) [Python] hdfs fails to connect to
for HDFS 3.x cluster
[ https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17135258#comment-17135258 ]
Andy commented on ARROW-9019:
-----------------------------
Think it is related to library version. I had similar issue - resolved by running export LD_LIBRARY_PATH=/opt/anaconda/3.7.1/lib/:$LD_LIBRARY_PATH prior to executing the code (you'll need to use your own python distro lib path).
> [Python] hdfs fails to connect to for HDFS 3.x cluster
> ------------------------------------------------------
>
> Key: ARROW-9019
> URL: https://issues.apache.org/jira/browse/ARROW-9019
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Thomas Graves
> Priority: Major
> Labels: filesystem, hdfs
>
> I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an error that looks like a protobuf or jar mismatch problem with Hadoop. The same code works on a Hadoop 2.9 cluster.
>
> I'm wondering if there is something special I need to do or if pyarrow doesn't support Hadoop 3.x yet?
> Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
>
> import pyarrow as pa
> hdfs_kwargs = dict(host="namenodehost",
> port=9000,
> user="tgraves",
> driver='libhdfs',
> kerb_ticket=None,
> extra_conf=None)
> fs = pa.hdfs.connect(**hdfs_kwargs)
> res = fs.exists("/user/tgraves")
>
> Error that I get on Hadoop 3.x is:
>
> dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
> ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
> at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
> at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
> at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
> at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
> at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
> at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
> at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
> at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
> at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)