You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 12:50:13 UTC

[GitHub] [arrow-rs] alamb commented on a diff in pull request #2957: Add BooleanArray::true_count and BooleanArray::false_count

alamb commented on code in PR #2957:
URL: https://github.com/apache/arrow-rs/pull/2957#discussion_r1008026886


##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -103,6 +103,53 @@ impl BooleanArray {
         &self.data.buffers()[0]
     }
 
+    /// Returns the number of true values within this buffer

Review Comment:
   ```suggestion
       /// Returns the number of non null, true values within this array
   ```



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -103,6 +103,53 @@ impl BooleanArray {
         &self.data.buffers()[0]
     }
 
+    /// Returns the number of true values within this buffer
+    pub fn true_count(&self) -> usize {
+        match self.data.null_buffer() {
+            Some(nulls) => {
+                let null_chunks = nulls.bit_chunks(self.offset(), self.len());
+                let value_chunks = self.values().bit_chunks(self.offset(), self.len());
+                null_chunks
+                    .iter()
+                    .zip(value_chunks.iter())
+                    .chain(std::iter::once((
+                        null_chunks.remainder_bits(),
+                        value_chunks.remainder_bits(),
+                    )))
+                    .map(|(a, b)| (a & b).count_ones() as usize)
+                    .sum()
+            }
+            None => self
+                .values()
+                .count_set_bits_offset(self.offset(), self.len()),
+        }
+    }
+
+    /// Returns the number of false values within this buffer
+    pub fn false_count(&self) -> usize {
+        match self.data.null_buffer() {

Review Comment:
   maybe this could be simplified into `self.size() - self.null_count() - self.true_count()` ? I think that would be basically as fast?



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -431,4 +479,29 @@ mod tests {
     fn test_from_array_data_validation() {
         let _ = BooleanArray::from(ArrayData::new_empty(&DataType::Int32));
     }
+
+    #[test]
+    fn test_true_false_count() {
+        let mut rng = thread_rng();
+
+        for _ in 0..10 {
+            let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();

Review Comment:
   ```suggestion
               // no nulls
               let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();
   ```



##########
arrow-array/src/array/boolean_array.rs:
##########
@@ -431,4 +479,29 @@ mod tests {
     fn test_from_array_data_validation() {
         let _ = BooleanArray::from(ArrayData::new_empty(&DataType::Int32));
     }
+
+    #[test]
+    fn test_true_false_count() {
+        let mut rng = thread_rng();
+
+        for _ in 0..10 {
+            let d: Vec<_> = (0..2000).map(|_| rng.gen_bool(0.5)).collect();
+            let b = BooleanArray::from(d.clone());
+
+            let expected_true = d.iter().filter(|x| **x).count();
+            assert_eq!(b.true_count(), expected_true);
+            assert_eq!(b.false_count(), d.len() - expected_true);
+
+            let d: Vec<_> = (0..2000)

Review Comment:
   ```suggestion
               // with nulls
               let d: Vec<_> = (0..2000)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org