You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/10 10:58:50 UTC

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #11993: ARROW-15153: [Python] Expose ReferencedBufferSize to python

jorisvandenbossche commented on a change in pull request #11993:
URL: https://github.com/apache/arrow/pull/11993#discussion_r781082338



##########
File path: python/pyarrow/array.pxi
##########
@@ -986,14 +986,49 @@ cdef class Array(_PandasConvertible):
     @property
     def nbytes(self):
         """
-        Total number of bytes consumed by the elements of the array.
+        Returns the sum of bytes from all buffer ranges referenced
+
+        Unlike TotalBufferSize this method will account for array
+        offsets.
+
+        If buffers are shared between arrays then the shared
+        portion will be counted multiple times.
+
+        Dictionary arrays will always be counted in their entirety

Review comment:
       ```suggestion
           The dictionary of dictionary arrays will always be counted in their entirety
   ```

##########
File path: python/pyarrow/array.pxi
##########
@@ -986,14 +986,49 @@ cdef class Array(_PandasConvertible):
     @property
     def nbytes(self):
         """
-        Total number of bytes consumed by the elements of the array.
+        Returns the sum of bytes from all buffer ranges referenced

Review comment:
       I personally find the original text clearer. We can maybe leave it as is, and add the "buffer ranges referenced" in the next paragraph below?

##########
File path: python/pyarrow/array.pxi
##########
@@ -986,14 +986,49 @@ cdef class Array(_PandasConvertible):
     @property
     def nbytes(self):
         """
-        Total number of bytes consumed by the elements of the array.
+        Returns the sum of bytes from all buffer ranges referenced
+
+        Unlike TotalBufferSize this method will account for array

Review comment:
       ```suggestion
           Unlike `get_total_buffer_size` this method will account for array
   ```

##########
File path: python/pyarrow/array.pxi
##########
@@ -986,14 +986,49 @@ cdef class Array(_PandasConvertible):
     @property
     def nbytes(self):
         """
-        Total number of bytes consumed by the elements of the array.
+        Returns the sum of bytes from all buffer ranges referenced
+
+        Unlike TotalBufferSize this method will account for array
+        offsets.
+
+        If buffers are shared between arrays then the shared
+        portion will be counted multiple times.
+
+        Dictionary arrays will always be counted in their entirety
+        even if the array only references a portion of the dictionary.
         """
-        size = 0
-        for buf in self.buffers():
-            if buf is not None:
-                size += buf.size
+        cdef:
+            shared_ptr[CArray] shd_ptr_c_array
+            CArray *c_array
+            CResult[int64_t] c_res_buffer
+
+        shd_ptr_c_array = pyarrow_unwrap_array(self)
+        c_array = shd_ptr_c_array.get()
+        c_res_buffer = ReferencedBufferSize(deref(c_array))
+        size = GetResultValue(c_res_buffer)
         return size
 
+    def get_total_buffer_size(self):
+        """
+        The sum of bytes in each buffer referenced by the array

Review comment:
       ```suggestion
           The sum of bytes in each buffer referenced by the array.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org