You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/28 21:34:37 UTC

[GitHub] [arrow-rs] alamb commented on a change in pull request #505: Correct array memory usage calculation for dictionary arrays

alamb commented on a change in pull request #505:
URL: https://github.com/apache/arrow-rs/pull/505#discussion_r660128501



##########
File path: arrow/src/array/array.rs
##########
@@ -198,10 +198,14 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
     }
 
     /// Returns the total number of bytes of memory occupied by the buffers owned by this array.

Review comment:
       ```suggestion
       /// Returns the total number of bytes of memory pointed to by this array.
       /// The buffers store bytes in the Arrow memory format, and include the data as well as the validity map.
   ```
   
   The distinction between `buffers` and `physically occupied` has always been somewhat confusing to me. Perhaps we can take this opportunity to clarify what they mean

##########
File path: arrow/src/array/array.rs
##########
@@ -661,4 +666,63 @@ mod tests {
             null_array.data().buffers()[0].len()
         );
     }
+
+    #[test]
+    fn test_memory_size_primitive() {
+        let arr = PrimitiveArray::<Int64Type>::from_iter_values(0..128);
+        let empty =
+            PrimitiveArray::<Int64Type>::from(ArrayData::new_empty(arr.data_type()));
+
+        // substract empty array to avoid magic numbers for the size of additional fields
+        assert_eq!(
+            arr.get_array_memory_size() - empty.get_array_memory_size(),

Review comment:
       this is a cool calculation 👍 

##########
File path: arrow/src/array/array.rs
##########
@@ -198,10 +198,14 @@ pub trait Array: fmt::Debug + Send + Sync + JsonEqual {
     }
 
     /// Returns the total number of bytes of memory occupied by the buffers owned by this array.
-    fn get_buffer_memory_size(&self) -> usize;
+    fn get_buffer_memory_size(&self) -> usize {
+        self.data_ref().get_buffer_memory_size()
+    }
 
     /// Returns the total number of bytes of memory occupied physically by this array.

Review comment:
       ```suggestion
       /// Returns the total number of bytes of memory occupied physically by this array.
       /// This value will always be greater than returned by `get_buffer_memory_size()` and
       /// includes the overhead of the data structures that contain the pointers to the various buffers.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org