You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Saurabh Bajaj (JIRA)" <ji...@apache.org> on 2019/08/06 19:13:00 UTC

[jira] [Comment Edited] (ARROW-6150) [Python] Intermittent HDFS error

    [ https://issues.apache.org/jira/browse/ARROW-6150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16901397#comment-16901397 ] 

Saurabh Bajaj edited comment on ARROW-6150 at 8/6/19 7:12 PM:
--------------------------------------------------------------

I tried setting port=8020 in pa.hdfs.connect(), but same intermittent errors. 


was (Author: sbajaj):
I tried setting `port=8020` in `pa.hdfs.connect()`, but same intermittent errors. 

> [Python] Intermittent HDFS error
> --------------------------------
>
>                 Key: ARROW-6150
>                 URL: https://issues.apache.org/jira/browse/ARROW-6150
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.14.1
>            Reporter: Saurabh Bajaj
>            Priority: Minor
>
> I'm running a Dask-YARN job that dumps a results dictionary into HDFS (code shown in traceback below) using PyArrow's HDFS IO library. However, the job intermittently runs into the error shown below, not every run, only sometimes. I'm unable to determine the root cause of this issue.
>  
> {{ File "/extractor.py", line 87, in __call__ json.dump(results_dict, fp=_UTF8Encoder(f), indent=4) File "pyarrow/io.pxi", line 72, in pyarrow.lib.NativeFile.__exit__ File "pyarrow/io.pxi", line 130, in pyarrow.lib.NativeFile.close File "pyarrow/error.pxi", line 87, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: HDFS CloseFile failed, errno: 255 (Unknown error 255) Please check that you are connecting to the correct HDFS RPC port}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)