You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "asmirnov82 (via GitHub)" <gi...@apache.org> on 2023/04/26 13:23:32 UTC

[GitHub] [arrow] asmirnov82 opened a new pull request, #35342: GH-32605: [C#] Extend validity buffer api

asmirnov82 opened a new pull request, #35342:
URL: https://github.com/apache/arrow/pull/35342

   Add a method to the ValidityBuffer that adds the same bool value length times without allocating an Enumerable.Repeat object
   
   ### Rationale for this change
   
   See more details in the code review comments in https://github.com/apache/arrow/pull/13810
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190184592


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,154 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));
+
+            // Use simpler method if there aren't many values
+            if (length < 20)
+            {
+                for (int i = index; i <= endBitIndex; i++)
+                {
+                    SetBit(data, i, value);
+                }
+                return;
+            }
+            
+            // Otherwise do the work to figure out how to copy whole bytes
+            int startByteIndex = index / 8;
+            int startBitOffset = index % 8;
+            int endByteIndex = endBitIndex / 8;
+            int endBitOffset = endBitIndex % 8;
+                        
+            // If the starting index and ending index are not byte-aligned,
+            // we'll need to set bits the slow way. If they are
+            // byte-aligned, and for all other bytes in the 'middle', we
+            // can use a faster byte-aligned set.
+            int fullByteStartIndex = startBitOffset == 0 ? startByteIndex : startByteIndex + 1;
+            int fullByteEndIndex = endBitOffset == 7 ? endByteIndex : endByteIndex - 1;
+
+            // Bits we will be using to finish up the first byte
+            if (startBitOffset != 0)
+            {
+                Span<byte> slice = data.Slice(startByteIndex, 1);
+                for (int i = startBitOffset; i <= 7; i++)
+                    SetBit(slice, i, value);
+            }
+
+            if (fullByteEndIndex >= fullByteStartIndex)
+            {
+                Span<byte> slice = data.Slice(fullByteStartIndex, fullByteEndIndex - fullByteStartIndex + 1);
+                byte fill = (byte)(value ? 0xFF : 0x00);
+
+                slice.Fill(fill);
+            }
+
+            if (endBitOffset != 7)
+            {
+                Span<byte> slice = data.Slice(endByteIndex, 1);
+                for (int i = 0; i <= endBitOffset; i++)
+                    SetBit(slice, i, value);
+            }
+        }
+
         public static void ToggleBit(Span<byte> data, int index)
         {
             data[index / 8] ^= BitMask[index % 8];
         }
 
         /// <summary>
         /// Counts the number of set bits in a span of bytes starting
-        /// at a specific bit offset.
+        /// at a specific bit index.
         /// </summary>
-        /// <param name="data">Span to count bits</param>
-        /// <param name="offset">Bit offset to start counting from</param>
-        /// <returns>Count of set (one) bits</returns>
-        public static int CountBits(ReadOnlySpan<byte> data, int offset) =>
-            CountBits(data, offset, data.Length * 8 - offset);
+        /// <param name="data">Span to count bits.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <returns>Count of set (one) bits.</returns>
+        public static int CountBits(ReadOnlySpan<byte> data, int index) =>

Review Comment:
   These changes are technically breaking changes as well. Let's not make them here. If we make the whole class internal, we can refactor this code in any way we want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190264244


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,6 +62,64 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="offset">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)

Review Comment:
   Fixed. Link for new issue: https://github.com/apache/arrow/issues/35540



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190281130


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,154 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));
+
+            // Use simpler method if there aren't many values
+            if (length < 20)
+            {
+                for (int i = index; i <= endBitIndex; i++)
+                {
+                    SetBit(data, i, value);
+                }
+                return;
+            }
+            
+            // Otherwise do the work to figure out how to copy whole bytes
+            int startByteIndex = index / 8;
+            int startBitOffset = index % 8;
+            int endByteIndex = endBitIndex / 8;
+            int endBitOffset = endBitIndex % 8;
+                        
+            // If the starting index and ending index are not byte-aligned,
+            // we'll need to set bits the slow way. If they are
+            // byte-aligned, and for all other bytes in the 'middle', we
+            // can use a faster byte-aligned set.
+            int fullByteStartIndex = startBitOffset == 0 ? startByteIndex : startByteIndex + 1;
+            int fullByteEndIndex = endBitOffset == 7 ? endByteIndex : endByteIndex - 1;
+
+            // Bits we will be using to finish up the first byte
+            if (startBitOffset != 0)
+            {
+                Span<byte> slice = data.Slice(startByteIndex, 1);
+                for (int i = startBitOffset; i <= 7; i++)
+                    SetBit(slice, i, value);
+            }
+
+            if (fullByteEndIndex >= fullByteStartIndex)
+            {
+                Span<byte> slice = data.Slice(fullByteStartIndex, fullByteEndIndex - fullByteStartIndex + 1);
+                byte fill = (byte)(value ? 0xFF : 0x00);
+
+                slice.Fill(fill);
+            }
+
+            if (endBitOffset != 7)
+            {
+                Span<byte> slice = data.Slice(endByteIndex, 1);
+                for (int i = 0; i <= endBitOffset; i++)
+                    SetBit(slice, i, value);
+            }
+        }
+
         public static void ToggleBit(Span<byte> data, int index)
         {
             data[index / 8] ^= BitMask[index % 8];
         }
 
         /// <summary>
         /// Counts the number of set bits in a span of bytes starting
-        /// at a specific bit offset.
+        /// at a specific bit index.
         /// </summary>
-        /// <param name="data">Span to count bits</param>
-        /// <param name="offset">Bit offset to start counting from</param>
-        /// <returns>Count of set (one) bits</returns>
-        public static int CountBits(ReadOnlySpan<byte> data, int offset) =>
-            CountBits(data, offset, data.Length * 8 - offset);
+        /// <param name="data">Span to count bits.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <returns>Count of set (one) bits.</returns>
+        public static int CountBits(ReadOnlySpan<byte> data, int index) =>

Review Comment:
   > Other changes are related only to naming, so I left them.
   
   They are still a source breaking change though. Someone could call this method as:
   
   ```C#
   BitUtility.CountBits(data, offset: 1)
   ```
   
   And now their code won't compile.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190264244


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,6 +62,64 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="offset">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190285881


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,146 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        internal static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));

Review Comment:
   Do we still need this argument checking now that the method is `internal`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35342:
URL: https://github.com/apache/arrow/pull/35342#issuecomment-1523417906

   * Closes: #32605


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1187540072


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,6 +62,64 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="offset">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)

Review Comment:
   Since this is a public API, can we do some argument checking? For example, `index + length < data.Length` and both `index` and `length` are non-negative.



##########
csharp/src/Apache.Arrow/ArrowBuffer.BitmapBuilder.cs:
##########
@@ -138,6 +138,24 @@ public BitmapBuilder AppendRange(IEnumerable<bool> values)
                 return this;
             }
 
+            /// <summary>
+            /// Append multiple bits.
+            /// </summary>
+            /// <param name="value">Value of bits to append.</param>
+            /// <param name="length">Number of times the value should be added.</param>
+            /// <returns>Returns the builder (for fluent-style composition).</returns>
+            public BitmapBuilder AppendRange(bool value, int length)

Review Comment:
   Should check that `length` is non-negative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190183024


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,6 +62,64 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="offset">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)

Review Comment:
   I think a decent compromise for this PR is to not add a new public API to BitUtility. Instead, make the new method `internal`. Then you don't need to add argument checking to this method.
   
   Then log an issue proposing making the whole `BitUtility` class `internal`. We should make that decision separate of this PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190283246


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,154 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));
+
+            // Use simpler method if there aren't many values
+            if (length < 20)
+            {
+                for (int i = index; i <= endBitIndex; i++)
+                {
+                    SetBit(data, i, value);
+                }
+                return;
+            }
+            
+            // Otherwise do the work to figure out how to copy whole bytes
+            int startByteIndex = index / 8;
+            int startBitOffset = index % 8;
+            int endByteIndex = endBitIndex / 8;
+            int endBitOffset = endBitIndex % 8;
+                        
+            // If the starting index and ending index are not byte-aligned,
+            // we'll need to set bits the slow way. If they are
+            // byte-aligned, and for all other bytes in the 'middle', we
+            // can use a faster byte-aligned set.
+            int fullByteStartIndex = startBitOffset == 0 ? startByteIndex : startByteIndex + 1;
+            int fullByteEndIndex = endBitOffset == 7 ? endByteIndex : endByteIndex - 1;
+
+            // Bits we will be using to finish up the first byte
+            if (startBitOffset != 0)
+            {
+                Span<byte> slice = data.Slice(startByteIndex, 1);
+                for (int i = startBitOffset; i <= 7; i++)
+                    SetBit(slice, i, value);
+            }
+
+            if (fullByteEndIndex >= fullByteStartIndex)
+            {
+                Span<byte> slice = data.Slice(fullByteStartIndex, fullByteEndIndex - fullByteStartIndex + 1);
+                byte fill = (byte)(value ? 0xFF : 0x00);
+
+                slice.Fill(fill);
+            }
+
+            if (endBitOffset != 7)
+            {
+                Span<byte> slice = data.Slice(endByteIndex, 1);
+                for (int i = 0; i <= endBitOffset; i++)
+                    SetBit(slice, i, value);
+            }
+        }
+
         public static void ToggleBit(Span<byte> data, int index)
         {
             data[index / 8] ^= BitMask[index % 8];
         }
 
         /// <summary>
         /// Counts the number of set bits in a span of bytes starting
-        /// at a specific bit offset.
+        /// at a specific bit index.
         /// </summary>
-        /// <param name="data">Span to count bits</param>
-        /// <param name="offset">Bit offset to start counting from</param>
-        /// <returns>Count of set (one) bits</returns>
-        public static int CountBits(ReadOnlySpan<byte> data, int offset) =>
-            CountBits(data, offset, data.Length * 8 - offset);
+        /// <param name="data">Span to count bits.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <returns>Count of set (one) bits.</returns>
+        public static int CountBits(ReadOnlySpan<byte> data, int index) =>

Review Comment:
   This probably isn't a major issue, especially if we are considering making the class internal. Just more of an "FYI, this is a breaking change."



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] eerhardt merged pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "eerhardt (via GitHub)" <gi...@apache.org>.
eerhardt merged PR #35342:
URL: https://github.com/apache/arrow/pull/35342


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190263510


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,154 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));
+
+            // Use simpler method if there aren't many values
+            if (length < 20)
+            {
+                for (int i = index; i <= endBitIndex; i++)
+                {
+                    SetBit(data, i, value);
+                }
+                return;
+            }
+            
+            // Otherwise do the work to figure out how to copy whole bytes
+            int startByteIndex = index / 8;
+            int startBitOffset = index % 8;
+            int endByteIndex = endBitIndex / 8;
+            int endBitOffset = endBitIndex % 8;
+                        
+            // If the starting index and ending index are not byte-aligned,
+            // we'll need to set bits the slow way. If they are
+            // byte-aligned, and for all other bytes in the 'middle', we
+            // can use a faster byte-aligned set.
+            int fullByteStartIndex = startBitOffset == 0 ? startByteIndex : startByteIndex + 1;
+            int fullByteEndIndex = endBitOffset == 7 ? endByteIndex : endByteIndex - 1;
+
+            // Bits we will be using to finish up the first byte
+            if (startBitOffset != 0)
+            {
+                Span<byte> slice = data.Slice(startByteIndex, 1);
+                for (int i = startBitOffset; i <= 7; i++)
+                    SetBit(slice, i, value);
+            }
+
+            if (fullByteEndIndex >= fullByteStartIndex)
+            {
+                Span<byte> slice = data.Slice(fullByteStartIndex, fullByteEndIndex - fullByteStartIndex + 1);
+                byte fill = (byte)(value ? 0xFF : 0x00);
+
+                slice.Fill(fill);
+            }
+
+            if (endBitOffset != 7)
+            {
+                Span<byte> slice = data.Slice(endByteIndex, 1);
+                for (int i = 0; i <= endBitOffset; i++)
+                    SetBit(slice, i, value);
+            }
+        }
+
         public static void ToggleBit(Span<byte> data, int index)
         {
             data[index / 8] ^= BitMask[index % 8];
         }
 
         /// <summary>
         /// Counts the number of set bits in a span of bytes starting
-        /// at a specific bit offset.
+        /// at a specific bit index.
         /// </summary>
-        /// <param name="data">Span to count bits</param>
-        /// <param name="offset">Bit offset to start counting from</param>
-        /// <returns>Count of set (one) bits</returns>
-        public static int CountBits(ReadOnlySpan<byte> data, int offset) =>
-            CountBits(data, offset, data.Length * 8 - offset);
+        /// <param name="data">Span to count bits.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <returns>Count of set (one) bits.</returns>
+        public static int CountBits(ReadOnlySpan<byte> data, int index) =>

Review Comment:
   Fixed. I reverted the change related to checking attributes and throwing exception. Other changes are related only to naming, so I left them. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190313531


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,146 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        internal static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));

Review Comment:
   Removed now



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1186683076


##########
csharp/src/Apache.Arrow/ArrowBuffer.BitmapBuilder.cs:
##########
@@ -138,6 +138,28 @@ public BitmapBuilder AppendRange(IEnumerable<bool> values)
                 return this;
             }
 
+
+            /// <summary>
+            /// Append multiple bits.
+            /// </summary>
+            /// <param name="value">Value of bits to append.</param>
+            /// <param name="length">Number of times the value should be added.</param>
+            /// <returns>Returns the builder (for fluent-style composition).</returns>
+            public BitmapBuilder AppendRange(bool value, int length)
+            {
+                EnsureAdditionalCapacity(length);
+
+                for (int i = 0; i < length; i++)
+                {
+                    BitUtility.SetBit(Span, Length, value);
+                    Length++;
+                }
+                                
+                SetBitCount += value ? length : 0;
+
+                return this;
+            }

Review Comment:
   Thanks for you suggestion. I like the idea! Similar approach is used in BitUtility.CountBits method, so I added a complementary implementation of SetBits method to BitUtility and used it in BitmapBuilder. I also changed CountBits to follow project official coding style.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1187973894


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,6 +62,64 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="offset">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)

Review Comment:
   You are right. However, after taking a closer look at BitUtility class I have some concern that BitUtility should be a public API. And here are reasons:
   1) There aren't any argument checking in all other methods (however, there are debug.asserts - I think it's done due to perfomance reason)
   2) This class is implemented with an assumption of being used with Bitmap only, when we know that the number of meaningful bits should be less than Max.Int (because each bit corresponds to the validity of value in Value.Span, wich length is limit by Int). In general case the amount of bit inside a Span<byte> can be Max,Int * 8 and correct methods for working with bits inside a span should have long value for bit index and for length.
   
   I added required checks into SetBits and CountBits methods (as this two are very similar) and didn't change other methods. You may see the resulting code in PR, but actually it looks intermediate compromise between two solutions:
    
   change all BitUtility methods to use longs for bit indices (with possible performance drawdown)
   
   or
   
   make BitUtility internal (breaking change, in case somebody uses BitUtility in their code)
   
   Would you please make some inputs on that?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1187974237


##########
csharp/src/Apache.Arrow/ArrowBuffer.BitmapBuilder.cs:
##########
@@ -138,6 +138,24 @@ public BitmapBuilder AppendRange(IEnumerable<bool> values)
                 return this;
             }
 
+            /// <summary>
+            /// Append multiple bits.
+            /// </summary>
+            /// <param name="value">Value of bits to append.</param>
+            /// <param name="length">Number of times the value should be added.</param>
+            /// <returns>Returns the builder (for fluent-style composition).</returns>
+            public BitmapBuilder AppendRange(bool value, int length)

Review Comment:
   Fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] asmirnov82 commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "asmirnov82 (via GitHub)" <gi...@apache.org>.
asmirnov82 commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1190313750


##########
csharp/src/Apache.Arrow/BitUtility.cs:
##########
@@ -62,73 +62,154 @@ public static void SetBit(Span<byte> data, int index, bool value)
                 : (byte)(data[idx] & ~BitMask[mod]);
         }
 
+        /// <summary>
+        /// Set the number of bits in a span of bytes starting
+        /// at a specific index, and limiting to length.
+        /// </summary>
+        /// <param name="data">Span to set bits value.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <param name="length">Maximum of bits in the span to consider.</param>
+        public static void SetBits(Span<byte> data, int index, int length, bool value)
+        {
+            if (length == 0)
+                return;
+
+            long spanLengthInBits = (long)data.Length * 8;
+
+            if (index < 0 || index >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(index));
+
+            int endBitIndex = checked(index + length - 1);
+
+            if (length < 0 || endBitIndex >= spanLengthInBits)
+                throw new ArgumentOutOfRangeException(nameof(length));
+
+            // Use simpler method if there aren't many values
+            if (length < 20)
+            {
+                for (int i = index; i <= endBitIndex; i++)
+                {
+                    SetBit(data, i, value);
+                }
+                return;
+            }
+            
+            // Otherwise do the work to figure out how to copy whole bytes
+            int startByteIndex = index / 8;
+            int startBitOffset = index % 8;
+            int endByteIndex = endBitIndex / 8;
+            int endBitOffset = endBitIndex % 8;
+                        
+            // If the starting index and ending index are not byte-aligned,
+            // we'll need to set bits the slow way. If they are
+            // byte-aligned, and for all other bytes in the 'middle', we
+            // can use a faster byte-aligned set.
+            int fullByteStartIndex = startBitOffset == 0 ? startByteIndex : startByteIndex + 1;
+            int fullByteEndIndex = endBitOffset == 7 ? endByteIndex : endByteIndex - 1;
+
+            // Bits we will be using to finish up the first byte
+            if (startBitOffset != 0)
+            {
+                Span<byte> slice = data.Slice(startByteIndex, 1);
+                for (int i = startBitOffset; i <= 7; i++)
+                    SetBit(slice, i, value);
+            }
+
+            if (fullByteEndIndex >= fullByteStartIndex)
+            {
+                Span<byte> slice = data.Slice(fullByteStartIndex, fullByteEndIndex - fullByteStartIndex + 1);
+                byte fill = (byte)(value ? 0xFF : 0x00);
+
+                slice.Fill(fill);
+            }
+
+            if (endBitOffset != 7)
+            {
+                Span<byte> slice = data.Slice(endByteIndex, 1);
+                for (int i = 0; i <= endBitOffset; i++)
+                    SetBit(slice, i, value);
+            }
+        }
+
         public static void ToggleBit(Span<byte> data, int index)
         {
             data[index / 8] ^= BitMask[index % 8];
         }
 
         /// <summary>
         /// Counts the number of set bits in a span of bytes starting
-        /// at a specific bit offset.
+        /// at a specific bit index.
         /// </summary>
-        /// <param name="data">Span to count bits</param>
-        /// <param name="offset">Bit offset to start counting from</param>
-        /// <returns>Count of set (one) bits</returns>
-        public static int CountBits(ReadOnlySpan<byte> data, int offset) =>
-            CountBits(data, offset, data.Length * 8 - offset);
+        /// <param name="data">Span to count bits.</param>
+        /// <param name="index">Bit index to start counting from.</param>
+        /// <returns>Count of set (one) bits.</returns>
+        public static int CountBits(ReadOnlySpan<byte> data, int index) =>

Review Comment:
   Got it. Thanks



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #35342:
URL: https://github.com/apache/arrow/pull/35342#issuecomment-1523417988

   :warning: GitHub issue #32605 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] westonpace commented on a diff in pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "westonpace (via GitHub)" <gi...@apache.org>.
westonpace commented on code in PR #35342:
URL: https://github.com/apache/arrow/pull/35342#discussion_r1186523477


##########
csharp/src/Apache.Arrow/ArrowBuffer.BitmapBuilder.cs:
##########
@@ -138,6 +138,28 @@ public BitmapBuilder AppendRange(IEnumerable<bool> values)
                 return this;
             }
 
+
+            /// <summary>
+            /// Append multiple bits.
+            /// </summary>
+            /// <param name="value">Value of bits to append.</param>
+            /// <param name="length">Number of times the value should be added.</param>
+            /// <returns>Returns the builder (for fluent-style composition).</returns>
+            public BitmapBuilder AppendRange(bool value, int length)
+            {
+                EnsureAdditionalCapacity(length);
+
+                for (int i = 0; i < length; i++)
+                {
+                    BitUtility.SetBit(Span, Length, value);
+                    Length++;
+                }
+                                
+                SetBitCount += value ? length : 0;
+
+                return this;
+            }

Review Comment:
   If you really want to eke out performance you could use [Span.Fill](https://learn.microsoft.com/en-us/dotnet/api/system.span-1.fill?view=net-7.0) for the whole bytes in the middle (assuming `length > 8`).  E.g. something like this:
   
   ```
       public BitmapBuilder AppendRange2(bool value, int length)
       {
           // Use simpler method if there aren't many values
           if (length < 20)
           {
               EnsureAdditionalCapacity(length);
   
               for (int i = 0; i < length; i++)
               {
                   BitUtility.SetBit(Span, Length, value);
                   Length++;
               }
   
               SetBitCount += value ? length : 0;
   
               return this;
           }
   
           // Otherwise do the work to figure out how to copy whole bytes
   
           int spaceLeftInCurrentByte = (8 - (Length % 8)) % 8;
           // Bits we will be using to finish up the current byte
           int remainderBits = Math.Min(length, spaceLeftInCurrentByte);
           // Whole bytes that will be set
           int wholeBytes = (length - remainderBits) / 8;
           // Bits to set after the last whole byte
           int trailingBits = length - remainderBits - (wholeBytes * 8);
           int newBytes = (trailingBits > 0) ? wholeBytes + 1 : wholeBytes;
   
           EnsureAdditionalCapacity(length);
   
           var span = Memory.Span;
           for (int i = 0; i < remainderBits; i++)
           {
               BitUtility.SetBit(span, Length, value);
               Length++;
           }
   
           if (wholeBytes > 0)
           {
               var fill = (byte)(value ? 0xFF : 0x00);
               // Length is guaranteed to be a multiple of 8 here
               span.Slice(Length / 8, wholeBytes).Fill(fill);
           }
           Length += wholeBytes * 8;
   
           for (int i = 0; i < trailingBits; i++)
           {
               BitUtility.SetBit(span, Length, value);
               Length++;
           }
   
           SetBitCount += value ? length : 0;
           return this;
       }
   ```
   
   The results are pretty significant once `length` gets large enough:
   
   ```
   |------------------ |-------- |--------------:|-----------:|-----------:|
   |  AppendRangeFalse |       1 |      5.546 ns |  0.0364 ns |  0.0323 ns |
   |   AppendRangeTrue |       1 |      5.613 ns |  0.0484 ns |  0.0429 ns |
   | AppendRange2False |       1 |      5.835 ns |  0.0751 ns |  0.0666 ns |
   |  AppendRange2True |       1 |      5.782 ns |  0.0280 ns |  0.0248 ns |
   |  AppendRangeFalse |     100 |    281.145 ns |  0.6840 ns |  0.6063 ns |
   |   AppendRangeTrue |     100 |    278.547 ns |  0.9460 ns |  0.8849 ns |
   | AppendRange2False |     100 |     17.507 ns |  0.0414 ns |  0.0346 ns |
   |  AppendRange2True |     100 |     17.108 ns |  0.0698 ns |  0.0653 ns |
   |  AppendRangeFalse |   10000 | 27,397.477 ns | 51.9248 ns | 46.0300 ns |
   |   AppendRangeTrue |   10000 | 26,872.102 ns | 59.5216 ns | 49.7032 ns |
   | AppendRange2False |   10000 |     14.870 ns |  0.1078 ns |  0.1009 ns |
   |  AppendRange2True |   10000 |     19.864 ns |  0.0747 ns |  0.0698 ns |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #35342: GH-32605: [C#] Extend validity buffer api

Posted by "ursabot (via GitHub)" <gi...@apache.org>.
ursabot commented on PR #35342:
URL: https://github.com/apache/arrow/pull/35342#issuecomment-1569209481

   Benchmark runs are scheduled for baseline = fbe5f641d327ee81db00ce5f056940a69f4d8603 and contender = f38943af09c51d2e50c94f03de14f88cbaa07a3f. f38943af09c51d2e50c94f03de14f88cbaa07a3f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/816ce2c4eb114aaaa8dfc3bf3348b8e6...8de8baff9cd94a4fb9b479b4c878e2ea/)
   [Finished :arrow_down:0.24% :arrow_up:0.03%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/49bc8b0015b74e969fcf298f0cd286b5...a2981b8edd7541b69dbc9f3e72ae86e5/)
   [Finished :arrow_down:0.33% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/9299454cd7e24adf950e5a74cc770864...f02af30bfb8b432d90a0bbf154f9356e/)
   [Finished :arrow_down:0.51% :arrow_up:0.06%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/959d858b2dff4c6e93a91db90a7f93d8...3dd6e9bbaec441abad8da8d146dfd5c5/)
   Buildkite builds:
   [Finished] [`f38943af` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2920)
   [Finished] [`f38943af` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2956)
   [Finished] [`f38943af` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2921)
   [Finished] [`f38943af` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2946)
   [Finished] [`fbe5f641` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2919)
   [Finished] [`fbe5f641` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2955)
   [Finished] [`fbe5f641` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2920)
   [Finished] [`fbe5f641` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2945)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org