You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/08 00:23:23 UTC

[GitHub] [arrow] mr-smidge commented on a change in pull request #7671: ARROW-8344: [C#] Bug-fixes to binary array plus other improvements

mr-smidge commented on a change in pull request #7671:
URL: https://github.com/apache/arrow/pull/7671#discussion_r451214556



##########
File path: csharp/src/Apache.Arrow/Arrays/BinaryArray.cs
##########
@@ -66,87 +66,158 @@ protected BuilderBase(IArrowType dataType)
                 ValueOffsets = new ArrowBuffer.Builder<int>();
                 ValueBuffer = new ArrowBuffer.Builder<byte>();
                 ValidityBuffer = new ArrowBuffer.BitmapBuilder();
+
+                // From the docs:
+                //
+                // The offsets buffer contains length + 1 signed integers (either 32-bit or 64-bit, depending on the
+                // logical type), which encode the start position of each slot in the data buffer. The length of the
+                // value in each slot is computed using the difference between the offset at that slot’s index and the
+                // subsequent offset.
+                //
+                // In this builder, we choose to append the first offset (zero) upon construction, and each trailing
+                // offset is then added after each individual item has been appended.
+                ValueOffsets.Append(this.Offset);

Review comment:
       For an array with N items, there need to be N+1 offset values written.
   
   Previously, an offset was written with each `Append*()` call, with the extra one written when calling `Build()`.  This PR flips this around to add the extra offset upon construction (or calling `Clear()`), with the others written post-hoc during `Append*()` calls.  As such, `Build()` is now idempotent.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org