You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/05 01:28:17 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

westonpace commented on a change in pull request #11993:
URL: https://github.com/apache/arrow/pull/11993#discussion_r778487164



##########
File path: python/pyarrow/array.pxi
##########
@@ -988,12 +988,53 @@ cdef class Array(_PandasConvertible):
         """
         Total number of bytes consumed by the elements of the array.
         """
-        size = 0
-        for buf in self.buffers():
-            if buf is not None:
-                size += buf.size
+        return self.get_referenced_buffer_size()
+
+    def get_referenced_buffer_size(self):

Review comment:
       Hmm, from the discussion I thought we were just going to name this `get_buffer_size` since `get_referenced_buffer_size` was a  little confusing.
   
   Actually, since `nbytes` is an exact proxy/forward of `get_referenced_buffer_size` can we get rid of the method entirely and just have the property (moving the method's implementation into the property)

##########
File path: python/pyarrow/tests/test_array.py
##########
@@ -2460,18 +2460,24 @@ def test_buffers_nested():
     assert struct.unpack('4xh', values) == (43,)
 
 
-def test_nbytes_sizeof():
+def test_total_buffer_size():

Review comment:
       Can we add just one test where the array is sliced so that `nbytes` and `get_total_buffer_size` return different values?

##########
File path: python/pyarrow/array.pxi
##########
@@ -988,12 +988,53 @@ cdef class Array(_PandasConvertible):
         """
         Total number of bytes consumed by the elements of the array.
         """
-        size = 0
-        for buf in self.buffers():
-            if buf is not None:
-                size += buf.size
+        return self.get_referenced_buffer_size()
+
+    def get_referenced_buffer_size(self):
+        """
+        Returns the sum of bytes from all buffer ranges referenced
+
+        Unlike TotalBufferSize this method will account for array
+        offsets.
+
+        If buffers are shared between arrays then the shared
+        portion will only be counted multiple times.

Review comment:
       ```suggestion
           If buffers are shared between arrays then the shared
           portion will be counted multiple times.
   ```
   
   Looks like this typo is in the C++ description too if you want to fix it there.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org