You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-9019) [Python] hdfs fails to connect to for HDFS 3.x cluster

     [ https://issues.apache.org/jira/browse/ARROW-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-9019:
----------------------------------

    Assignee:     (was: Krisztian Szucs)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [Python] hdfs fails to connect to for HDFS 3.x cluster
> ------------------------------------------------------
>
>                 Key: ARROW-9019
>                 URL: https://issues.apache.org/jira/browse/ARROW-9019
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Thomas Graves
>            Priority: Major
>              Labels: filesystem, hdfs
>
> I'm trying to use the pyarrow hdfs connector with Hadoop 3.1.3 and I get an error that looks like a protobuf or jar mismatch problem with Hadoop. The same code works on a Hadoop 2.9 cluster.
>  
> I'm wondering if there is something special I need to do or if pyarrow doesn't support Hadoop 3.x yet?
> Note I tried with pyarrow 0.15.1, 0.16.0, and 0.17.1.
>  
>     import pyarrow as pa
>     hdfs_kwargs = dict(host="namenodehost",
>                       port=9000,
>                       user="tgraves",
>                       driver='libhdfs',
>                       kerb_ticket=None,
>                       extra_conf=None)
>     fs = pa.hdfs.connect(**hdfs_kwargs)
>     res = fs.exists("/user/tgraves")
>  
> Error that I get on Hadoop 3.x is:
>  
> dfsExists: invokeMethod((Lorg/apache/hadoop/fs/Path;)Z) error:
> ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Messagejava.lang.ClassCastException: org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$GetFileInfoRequestProto cannot be cast to org.apache.hadoop.shaded.com.google.protobuf.Message
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
>         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
>         at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
>         at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
>         at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
>         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)