You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/19 11:54:44 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #10074: ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet

lidavidm commented on a change in pull request #10074:
URL: https://github.com/apache/arrow/pull/10074#discussion_r615781238



##########
File path: python/pyarrow/parquet.py
##########
@@ -1674,6 +1686,12 @@ def pieces(self):
     keys and only a hive-style directory structure is supported. When
     setting `use_legacy_dataset` to False, also within-file level filtering
     and different partitioning schemes are supported.
+pre_buffer : bool, default True
+    Coalesce and issue file reads in parallel to improve performance on

Review comment:
       There's no conflict: use_threads controls whether decoding work is done in parallel on the CPU thread pool, and pre_buffer controls whether I/O is done in parallel on the I/O thread pool. I guess for someone who wants Arrow to be truly single-threaded, this may be confusing, so I'll see if I can reword this.
   
   Also, #9620 has some refactoring to allow pre-buffering without requiring use of the I/O thread pool.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org