You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/27 16:12:49 UTC

[GitHub] [arrow] pitrou commented on a change in pull request #12160: ARROW-13467: [C++] Support delta dictionaries in the IPC file format

pitrou commented on a change in pull request #12160:
URL: https://github.com/apache/arrow/pull/12160#discussion_r793768579



##########
File path: python/pyarrow/ipc.pxi
##########
@@ -124,6 +124,15 @@ cdef class IpcWriteOptions(_Weakrefable):
     emit_dictionary_deltas : bool
         Whether to emit dictionary deltas.  Default is false for maximum
         stream compatibility.
+    unify_dictionaries : bool
+        If true then calls to write_table will attempt to unify dictionaries
+        across all batches in the table.  This can help avoid the need for
+        replacement dictionaries (which the file format does not support)
+        but requires computing the unified dictionary and then remapping
+        the indices arrays.
+
+        This property is ignored when writing to the streaming format as
+        the streaming format can support replacement dictionaries.

Review comment:
       ```suggestion
           This parameter is ignored when writing to the IPC stream format as
           the IPC stream format can support replacement dictionaries.
   ```

##########
File path: cpp/src/arrow/ipc/options.h
##########
@@ -87,9 +87,9 @@ struct ARROW_EXPORT IpcWriteOptions {
 
   /// \brief Whether to unify dictionaries for the IPC file format
   ///
-  /// The IPC file format doesn't support dictionary replacements or deltas.
+  /// The IPC file format doesn't support dictionary replacements.
   /// Therefore, chunks of a column with a dictionary type must have the same
-  /// dictionary in each record batch.
+  /// dictionary in each record batch (or an extended dictionary + delta).
   ///
   /// If this option is true, RecordBatchWriter::WriteTable will attempt
   /// to unify dictionaries across each table column.  If this option is

Review comment:
       Does the sentence below need to be updated? ("unequal dictionaries")




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org