You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/17 05:10:00 UTC

[jira] [Commented] (ARROW-2002) use pyarrow download file will raise queue.Full exceptions sometimes

    [ https://issues.apache.org/jira/browse/ARROW-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16328270#comment-16328270 ] 

ASF GitHub Bot commented on ARROW-2002:
---------------------------------------

kmiku7 opened a new pull request #1485: ARROW-2002: check write_queue is not full and writer_thread is alive before enqueue new record when download file.
URL: https://github.com/apache/arrow/pull/1485
 
 
   use pyarrow download file will raise queue.Full exceptions sometimes.
   jira: https://issues.apache.org/jira/browse/ARROW-2002
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> use pyarrow download file will raise queue.Full exceptions sometimes
> --------------------------------------------------------------------
>
>                 Key: ARROW-2002
>                 URL: https://issues.apache.org/jira/browse/ARROW-2002
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>         Environment: operating system: all
> platform: all
>            Reporter: kmiku7
>            Priority: Major
>              Labels: pull-request-available
>
> When we download file from hdfs, if the speed writer thread write data is slower than read speed, download() will raise queue.Fulll exceptions, because write_queue is full.
> I think when we download file, we can wait until write_queue has space to enqueue new item if writer_thread is alive. Like what upload() does.
> {code}
> >>> import pyarrow as pa
> >>> cli = pa.hdfs.connect(user='USERNAME')
> >>> cli.download('/REMOTE/HDFS/PATH', '/LOCAL/FILE/PATH')
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "pyarrow/io-hdfs.pxi", line 428, in pyarrow.lib.HadoopFileSystem.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66399)
>   File "pyarrow/io-hdfs.pxi", line 429, in pyarrow.lib.HadoopFileSystem.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:66351)
>   File "pyarrow/io.pxi", line 315, in pyarrow.lib.NativeFile.download (/arrow/python/build/temp.linux-x86_64-3.4/lib.cxx:52249)
>   File "/usr/lib/python3.4/queue.py", line 187, in put_nowait
>     return self.put(item, block=False)
>   File "/usr/lib/python3.4/queue.py", line 133, in put
>     raise Full
> queue.Full
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)