You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Fred Tzeng (JIRA)" <ji...@apache.org> on 2019/07/26 06:51:00 UTC

[jira] [Created] (ARROW-6044) Pyarrow HDFS client gets hung after a while

Fred Tzeng created ARROW-6044:
---------------------------------

             Summary: Pyarrow HDFS client gets hung after a while
                 Key: ARROW-6044
                 URL: https://issues.apache.org/jira/browse/ARROW-6044
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.13.0
         Environment: hadoop-3.0.3
driver='libhdfs'
python 3.6
Centos7
            Reporter: Fred Tzeng


I'm using the pyarrow HDFS client in a long running (forever) app that makes connections to HDFS as external requests come in and destroys the connection as soon as the request is handled. This happens a large amount of times on separate threads and everything works great.

The problem is, after the app idles for a while (perhaps hours) and no HDFS connections are made during this time, when the next connection is attempted, the API hdfs.connect(...) just hangs. No exceptions are thrown.

Code snippet on what i'm doing to instantiate each connection:

...

hdfs = pyarrow.hdfs.connect(self.hdfs_authority, self.hdfs_port, user=self.hdfs_user)

try:

//Do something

finally:

hdfs.close

 

Any help on what might be causing these hangs is appreciated

 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)