You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "emcake (via GitHub)" <gi...@apache.org> on 2023/06/13 15:14:39 UTC

[GitHub] [arrow-rs] emcake opened a new issue, #4408: Slicing a map array does not respect offset

emcake opened a new issue, #4408:
URL: https://github.com/apache/arrow-rs/issues/4408

   **Describe the bug**
   When using `.slice(...)` on a `RecordBatch` that contains a map column, the map does not respect the offset of the slice.
   
   **To Reproduce**
   The following test fails:
   
   ```rust
   
       #[test]
       fn encode_maps() {
           let key_inner = Field::new("keys", DataType::UInt32, false);
           let val_inner = Field::new("values", DataType::UInt32, true);
           let kv_inner = Field::new(
               "entries",
               arrow::datatypes::DataType::Struct(vec![key_inner.clone(), val_inner.clone()].into()),
               false,
           );
           let map_field = Field::new("map", DataType::Map(Arc::new(kv_inner), false), false);
   
           let schema = Arc::new(Schema::new(vec![map_field]));
   
           let values = {
               let k = UInt32Builder::new();
               let v = UInt32Builder::new();
               let mut map = MapBuilder::new(None, k, v);
   
               for i in 0..100000 {
                   {
                       let k: &mut UInt32Builder = map.keys();
                       k.append_value(i);
                   }
                   {
                       let v: &mut UInt32Builder = map.values();
                       v.append_value(i);
                   }
                   map.append(true).unwrap()
               }
   
               map.finish()
           };
   
           let batch = RecordBatch::try_new(Arc::clone(&schema), vec![Arc::new(values)]).unwrap();
   
           const SLICE_OFFSET: usize = 999;
   
           let sliced = batch.slice(999, 1);
   
           assert_eq!(sliced.num_rows(), 1);
   
           {
               let items = sliced
                   .column(0)
                   .as_any()
                   .downcast_ref::<arrow::array::MapArray>()
                   .unwrap();
   
               let keys = items.keys().as_any().downcast_ref::<UInt32Array>().unwrap();
               let values = items
                   .values()
                   .as_any()
                   .downcast_ref::<UInt32Array>()
                   .unwrap();
   
               assert_eq!(keys.value(0), values.value(0));
               assert_eq!(keys.value(0), SLICE_OFFSET as u32) // fails - 0 vs 999
           }
       }
   ```
   
   **Expected behavior**
   The test should pass.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4408: Slicing a map array does not respect offset

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4408:
URL: https://github.com/apache/arrow-rs/issues/4408#issuecomment-1589678323

   A MapArray is a stored using the [variable sized list layout](https://arrow.apache.org/docs/format/Columnar.html#variable-size-list-layout). As such it has an offsets buffer that determines the length of its children.
   
   So in your example to access the value you could either use `MapArray::value`
   
   ```
   let v = map.value(0);
   assert_eq!(v.len(), 1);
   
   let keys = v.column(0).as_primitive::<UInt32Type>();
   let values = v.column(1).as_primitive::<UInt32Type>();
   
   assert_eq!(keys.value(0), values.value(0));
   assert_eq!(keys.value(0), SLICE_OFFSET as u32);
   ```
   
   Or you can access the underlying arrays directly, using `value_offsets`, and apply this to `MapArray::keys` and `MapArray::values`
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4408: Slicing a map array does not respect offset

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4408:
URL: https://github.com/apache/arrow-rs/issues/4408#issuecomment-1589595616

   I think this is correct, the slice is applied to the offsets not the underlying entries?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] emcake commented on issue #4408: Slicing a map array does not respect offset

Posted by "emcake (via GitHub)" <gi...@apache.org>.
emcake commented on issue #4408:
URL: https://github.com/apache/arrow-rs/issues/4408#issuecomment-1589638721

   I'm constructing a map here of keys and values, both matching its index in the batch:
   
   ```
   { key : 0, value : 0 }
   { key : 1, value : 1 }
   { key : 2, value : 2 }
   ...
   ```
   
   If I take this map array and slice to the 999th element, I'd expect the first item in the array to be:
   
   ```
   { key : 999, value : 999 }
   ```
   
   Am I missing something here?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] emcake closed issue #4408: Slicing a map array does not respect offset

Posted by "emcake (via GitHub)" <gi...@apache.org>.
emcake closed issue #4408: Slicing a map array does not respect offset
URL: https://github.com/apache/arrow-rs/issues/4408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] emcake commented on issue #4408: Slicing a map array does not respect offset

Posted by "emcake (via GitHub)" <gi...@apache.org>.
emcake commented on issue #4408:
URL: https://github.com/apache/arrow-rs/issues/4408#issuecomment-1590750067

   Okay, thanks - confirmed with this that the test I had passes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] emcake closed issue #4408: Slicing a map array does not respect offset

Posted by "emcake (via GitHub)" <gi...@apache.org>.
emcake closed issue #4408: Slicing a map array does not respect offset
URL: https://github.com/apache/arrow-rs/issues/4408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org