You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/06/15 17:33:57 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #4415: Add DictionaryArray::occupancy

alamb commented on code in PR #4415:
URL: https://github.com/apache/arrow-rs/pull/4415#discussion_r1231344888


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -549,6 +550,29 @@ impl<K: ArrowDictionaryKeyType> DictionaryArray<K> {
             .for_each(|v| *v = op(*v));
         Ok(builder.finish())
     }
+
+    /// Computes an occupancy mask for this dictionary's values
+    ///
+    /// For each value in [`Self::values`] the corresponding bit will be set in the
+    /// returned mask if it is referenced by a key in this [`DictionaryArray`]
+    pub fn occupancy(&self) -> BooleanBuffer {
+        let len = self.values.len();
+        let mut builder = BooleanBufferBuilder::new(len);
+        builder.resize(len);
+        let slice = builder.as_slice_mut();
+        match self.keys.nulls().filter(|n| n.null_count() > 0) {

Review Comment:
   This is a fancy way to check for nulls -- TIL `Option::filter` 👍 



##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -549,6 +550,29 @@ impl<K: ArrowDictionaryKeyType> DictionaryArray<K> {
             .for_each(|v| *v = op(*v));
         Ok(builder.finish())
     }
+
+    /// Computes an occupancy mask for this dictionary's values
+    ///
+    /// For each value in [`Self::values`] the corresponding bit will be set in the
+    /// returned mask if it is referenced by a key in this [`DictionaryArray`]
+    pub fn occupancy(&self) -> BooleanBuffer {
+        let len = self.values.len();
+        let mut builder = BooleanBufferBuilder::new(len);
+        builder.resize(len);
+        let slice = builder.as_slice_mut();
+        match self.keys.nulls().filter(|n| n.null_count() > 0) {
+            Some(n) => {
+                let v = self.keys.values();
+                n.valid_indices()
+                    .for_each(|idx| set_bit(slice, v[idx].as_usize()))

Review Comment:
   Would there be any value in calling `valid_slices` here instead? I assume it would be a tradeoff for sparse dictionaries and mostly used dictionaries. 



##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -549,6 +550,29 @@ impl<K: ArrowDictionaryKeyType> DictionaryArray<K> {
             .for_each(|v| *v = op(*v));
         Ok(builder.finish())
     }
+
+    /// Computes an occupancy mask for this dictionary's values
+    ///
+    /// For each value in [`Self::values`] the corresponding bit will be set in the
+    /// returned mask if it is referenced by a key in this [`DictionaryArray`]
+    pub fn occupancy(&self) -> BooleanBuffer {
+        let len = self.values.len();
+        let mut builder = BooleanBufferBuilder::new(len);
+        builder.resize(len);
+        let slice = builder.as_slice_mut();

Review Comment:
   nit: calling this `target` or `dest` or `dst` I think would be a more descriptive name for what it is rather than `slice`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org