You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/21 12:03:54 UTC

[GitHub] [arrow] lidavidm commented on a change in pull request #10118: ARROW-12468: Expose ScannerBuilder::UseAsync to python & R

lidavidm commented on a change in pull request #10118:
URL: https://github.com/apache/arrow/pull/10118#discussion_r617470663



##########
File path: r/R/dataset-scan.R
##########
@@ -183,6 +191,10 @@ ScannerBuilder <- R6Class("ScannerBuilder", inherit = ArrowObject,
       dataset___ScannerBuilder__UseThreads(self, threads)
       self
     },
+    UseAsync = function(use_async = FALSE) {

Review comment:
       Did you mean to default to TRUE?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2746,6 +2746,10 @@ cdef class Scanner(_Weakrefable):
     use_threads : bool, default True
         If enabled, then maximum parallelism will be used determined by
         the number of available CPU cores.
+    use_async : bool, default False
+        If enabled, the an async scanner will be used that should offer
+        better performance with high-latency/highly-parallel filesystems
+        (e.g. S3)

Review comment:
       The option needs to be added to _populate_builder and Scanner.from_fragment/Scanner.from_dataset or else it won't actually take effect.

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -2746,6 +2746,10 @@ cdef class Scanner(_Weakrefable):
     use_threads : bool, default True
         If enabled, then maximum parallelism will be used determined by
         the number of available CPU cores.
+    use_async : bool, default False
+        If enabled, the an async scanner will be used that should offer
+        better performance with high-latency/highly-parallel filesystems
+        (e.g. S3)

Review comment:
       Should we also add the parameter to the tests? Maybe some refactoring is needed though to make it easier to share parameters like this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org