You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/03 12:30:54 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

jorisvandenbossche commented on a change in pull request #7631:
URL: https://github.com/apache/arrow/pull/7631#discussion_r449556844



##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -773,6 +789,14 @@ cdef class FileFragment(Fragment):
         Fragment.init(self, sp)
         self.file_fragment = <CFileFragment*> sp.get()
 
+    def __reduce__(self):
+        buffer = self.buffer
+        return self.format.make_fragment, (

Review comment:
       By specifying here the method on a format object (`format.make_fragment`), it also automatically pickles the `format` *instance* ?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -887,6 +911,14 @@ cdef class ParquetFileFragment(FileFragment):
         FileFragment.init(self, sp)
         self.parquet_file_fragment = <CParquetFileFragment*> sp.get()
 
+    def __reduce__(self):
+        return self.format.make_fragment, (
+            self.path,

Review comment:
       I suppose you might need to do the same `self.path if buffer is None else buffer,` here as you did for `FileFragment` ?

##########
File path: python/pyarrow/_dataset.pyx
##########
@@ -773,6 +789,14 @@ cdef class FileFragment(Fragment):
         Fragment.init(self, sp)
         self.file_fragment = <CFileFragment*> sp.get()
 
+    def __reduce__(self):
+        buffer = self.buffer
+        return self.format.make_fragment, (

Review comment:
       We should maybe ensure this with testing a picking roundtrip for a case that specified read params in the ParquetFileFormat object




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org