You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/04 17:37:39 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10074: ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet

jorisvandenbossche commented on a change in pull request #10074:
URL: https://github.com/apache/arrow/pull/10074#discussion_r625977041



##########
File path: python/pyarrow/parquet.py
##########
@@ -1212,13 +1217,20 @@ class ParquetDataset:
     new Arrow Dataset API). Among other things, this allows to pass
     `filters` for all columns and not only the partition keys, enables
     different partitioning schemes, etc.
+pre_buffer : bool, default True
+    Coalesce and issue file reads in parallel to improve performance on
+    high-latency filesystems (e.g. S3). If True, Arrow will use a
+    background I/O thread pool. This option is only supported for
+    use_legacy_dataset=True. If using a filesystem layer that itself

Review comment:
       ```suggestion
       use_legacy_dataset=False. If using a filesystem layer that itself
   ```
   
   ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org