You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/19 10:11:44 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10074: ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet

jorisvandenbossche commented on a change in pull request #10074:
URL: https://github.com/apache/arrow/pull/10074#discussion_r615717077



##########
File path: python/pyarrow/parquet.py
##########
@@ -1674,6 +1686,12 @@ def pieces(self):
     keys and only a hive-style directory structure is supported. When
     setting `use_legacy_dataset` to False, also within-file level filtering
     and different partitioning schemes are supported.
+pre_buffer : bool, default True
+    Coalesce and issue file reads in parallel to improve performance on

Review comment:
       Is there any impact/conflict with specifying `use_threads=False` ?

##########
File path: python/pyarrow/parquet.py
##########
@@ -1244,7 +1255,7 @@ def __init__(self, path_or_paths, filesystem=None, schema=None,
                  metadata=None, split_row_groups=False, validate_schema=True,
                  filters=None, metadata_nthreads=1, read_dictionary=None,
                  memory_map=False, buffer_size=0, partitioning="hive",
-                 use_legacy_dataset=True):
+                 use_legacy_dataset=True, pre_buffer=True):

Review comment:
       Should we maybe raise an error here if the user sets it to False? (as we simply ignore it)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org