You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/10 20:06:13 UTC

[GitHub] [arrow-rs] alamb commented on a change in pull request #1296: wip. Implement an iterator for DictionaryArray

alamb commented on a change in pull request #1296:
URL: https://github.com/apache/arrow-rs/pull/1296#discussion_r804061136



##########
File path: arrow/src/array/iterator.rs
##########
@@ -403,6 +403,109 @@ impl<'a, S: OffsetSizeTrait> std::iter::ExactSizeIterator
 {
 }
 
+/// an iterator that returns Some(T) or None, that can be used on any DictionaryArray
+// Note: This implementation is based on std's [Vec]s' [IntoIter].
+#[derive(Debug)]
+pub struct DictionaryIter<'a, K: ArrowPrimitiveType, T: ArrowPrimitiveType> {
+    array: &'a DictionaryArray<K>,
+    values: &'a PrimitiveArray<T>,
+    current: usize,
+    current_end: usize,
+}

Review comment:
       🤔  as you note, the way this is implemented, it will only work for DictionaryArrays whose values are `PrimitiveArray<>` such as `Int64Array`
   
   If we take this approach, we'll probably end up with different iterator types for different value types like
   
   
   ```rust
   pub struct DictionaryIterPrimitive<'a, K: ArrowPrimitiveType, T: ArrowPrimitiveType> {
   ...
   }
   
   impl<'a, K: ArrowPrimitiveType, T: ArrowPrimitiveType> std::iter::Iterator
       for DictionaryIter<'a, K, T>
   {
       type Item = Option<T::Native>;
   ...
   }
   ```
   
   ```rust
   pub struct DictionaryIterBool<'a, K: ArrowPrimitiveType> {
   ...
   }
   
   impl <'a, K: ArrowPrimitiveType> std::iter::Iterator for DictionaryIterBool<'a, K>
   {
       type Item = Option<bool>;
   ...
   }
   ...
   ```
   
   And then functions like the following
   
   ```rust
   impl DictionaryArray {
     pub fn iter_primitive<T>::(&self) -> DictionaryIterPrimitive
     ...
     }
   
     pub fn iter_bool<T>::(&self) -> DictionaryIterBool
     ...
     }
   }
   ```
   
   Which would effectively require the user to know what type of values their dictionary held
   
   🤔 
   
   

##########
File path: arrow/src/array/iterator.rs
##########
@@ -403,6 +403,109 @@ impl<'a, S: OffsetSizeTrait> std::iter::ExactSizeIterator
 {
 }
 
+/// an iterator that returns Some(T) or None, that can be used on any DictionaryArray
+// Note: This implementation is based on std's [Vec]s' [IntoIter].
+#[derive(Debug)]
+pub struct DictionaryIter<'a, K: ArrowPrimitiveType, T: ArrowPrimitiveType> {
+    array: &'a DictionaryArray<K>,
+    values: &'a PrimitiveArray<T>,
+    current: usize,
+    current_end: usize,
+}

Review comment:
       Thinking some more we could potentially avoid parameterizing on the key type by always upcasting the indexes to `usize` (which has to happen anyways) before creating the iterator




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org