You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/22 13:49:08 UTC

[GitHub] [arrow-rs] TimDiekmann opened a new issue #1352: Expose an `advance` method on builders

TimDiekmann opened a new issue #1352:
URL: https://github.com/apache/arrow-rs/issues/1352


   Given the scenario from the [Fixed size list layout in the Arrow Columnar Format](https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout), I have a list of IP addresses:
   
   For an array of length 4 with respective values:
   
   ```json
   [[192, 168, 0, 12], null, [192, 168, 0, 25], [192, 168, 0, 1]]
   ```
   
   which will have the following representation:
   ```
   * Length: 4, Null count: 1
   * Validity bitmap buffer:
   
     | Byte 0 (validity bitmap) | Bytes 1-63            |
     |--------------------------|-----------------------|
     | 00001101                 | 0 (padding)           |
   
   * Values array (byte array):
     * Length: 16,  Null count: 0
     * validity bitmap buffer: Not required
   
       | Bytes 0-3       | Bytes 4-7   | Bytes 8-15                      |
       |-----------------|-------------|---------------------------------|
       | 192, 168, 0, 12 | unspecified | 192, 168, 0, 25, 192, 168, 0, 1 |
   ```
   
   When using the builder-API I currently have this code:
   
   ```rust
   let values: [Option<[u8; 4]>; 4] = [
       Some([192, 168, 0, 12]),
       None,
       Some([192, 168, 0, 25]),
       Some([192, 168, 0, 1]),
   ];
   
   // Create the fixed-size list builder with the primitive builder inside
   let mut list_builder =
       FixedSizeListBuilder::new(PrimitiveBuilder::<UInt8Type>::new(4 * 4), 4);
   
   for value in values {
       let value_builder = list_builder.values();
       if let Some(value) = &value {
           // All fine, use the API as expected
           value_builder.append_slice(value)?;
           list_builder.append(true)?;
       } else {
           // How to advance by for without setting nulls?
           value_builder.append_nulls(4)?;
           list_builder.append(false)?;
       }
   }
   
   let array = list_builder.finish();
   
   // Check representation as noted in the specs
   assert_eq!(array.length(), 1);
   assert_eq!(array.null_count(), 1);
   
   // Check value array as noted in the specs
   assert_eq!(array.values().len(), 16);
   assert_eq!(array.values().null_count(), 0);
   ```
   
   As I wrote `value_builder.append_nulls(4)` the value array have the nulls inside. As in the specs, the values are unspecified and the null count of the value array **must be 0**. In order to achieve this, I could replace it with `value_builder.append_values(&[0, 0, 0, 0], &[false, false, false, false])` or do a loop, but a) this introduces unnecessary overhead because the values are not needed, b) is pretty boiler plate code, and c) isn't very easy if the type isn't known beforehand in more complex examples.
   
   
   ## Proposed solution
   
   If builders would have an `fn advance` method, then one could simply call `array.advance(4)`, which won't set the values to `null` at the internal bitmap.
   
   The `BufferBuilder` already have an `advance` method, this could probably be forwarded to the user.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1352: Expose an `advance` method on builders

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1352:
URL: https://github.com/apache/arrow-rs/issues/1352#issuecomment-1054643652


   I wonder if we could just add `advance()` to the ListBuilder interface. 
   
   I am a little worried about adding a ` pub advance()` function to all the builders that simply leaves the values undefined as it would be fairly easy to misuse.
   
   Maybe we `ListBuilder::advance` could be `pub` and the `advance()` functions on the value builders could be `pub(crate)` 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org