You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/14 05:32:40 UTC

[GitHub] [arrow-rs] JasonLi-cn commented on a diff in pull request #2040: Truncate IPC record batch

JasonLi-cn commented on code in PR #2040:
URL: https://github.com/apache/arrow-rs/pull/2040#discussion_r920765737


##########
arrow/src/ipc/writer.rs:
##########
@@ -894,12 +1004,88 @@ fn write_array_data(
             Some(buffer) => buffer.clone(),
         };
 
-        offset = write_buffer(&null_buffer, buffers, arrow_data, offset);
+        let sliced_null_buffer = if write_options.enable_truncate_array {
+            null_buffer.bit_slice(array_data.offset(), array_data.len())
+        } else {
+            null_buffer
+        };
+        offset = write_buffer(sliced_null_buffer.as_slice(), buffers, arrow_data, offset);

Review Comment:
   I think there should be processed like this: 
   ```rust
       if has_validity_bitmap(array_data.data_type(), write_options) {
           // write null buffer if exists
           let null_buffer = match array_data.null_buffer() {
               None => {
                   // create a buffer and fill it with valid bits
                   let num_bytes = bit_util::ceil(num_rows, 8);
                   let buffer = MutableBuffer::new(num_bytes);
                   let buffer = buffer.with_bitset(num_bytes, true);
                   buffer.into()
               }
               Some(buffer) => {
                   if write_options.enable_truncate_array {
                       buffer.bit_slice(array_data.offset(), array_data.len())
                   } else {
                       buffer.clone()
                   }
               }
           };
   
           offset = write_buffer(null_buffer.as_slice(), buffers, arrow_data, offset);
       }
   ```
   Slice of RecordBatch which has no null_buffer may case Panic, for example: record_batch.num_rows() = 100,  let rb_slice = record_batch.slice(64, 36), process rb_slice will case Panic("the offset of the new Buffer cannot exceed the existing length") when call null_buffer.bit_slice(array_data.offset(), array_data.len()). 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org