You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "CurtHagenlocher (via GitHub)" <gi...@apache.org> on 2023/06/16 12:49:00 UTC

[GitHub] [arrow] CurtHagenlocher opened a new pull request, #36122: GH-36120: [C#] Support schema metadata through the C API

CurtHagenlocher opened a new pull request, #36122:
URL: https://github.com/apache/arrow/pull/36122

   ### What changes are included in this PR?
   
   Import and export of field- and schema-level metadata via the C API.
   
   ### Are these changes tested?
   
   Yes
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36122:
URL: https://github.com/apache/arrow/pull/36122#issuecomment-1594626738

   :warning: GitHub issue #36120 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 merged pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 merged PR #36122:
URL: https://github.com/apache/arrow/pull/36122


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36122:
URL: https://github.com/apache/arrow/pull/36122#issuecomment-1594625412

   :warning: GitHub issue #36120 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #36122:
URL: https://github.com/apache/arrow/pull/36122#issuecomment-1594625856

   :warning: GitHub issue #36120 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] CurtHagenlocher commented on pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "CurtHagenlocher (via GitHub)" <gi...@apache.org>.
CurtHagenlocher commented on PR #36122:
URL: https://github.com/apache/arrow/pull/36122#issuecomment-1594931059

   @wjones127 has offered to review ;)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] conbench-apache-arrow[bot] commented on pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "conbench-apache-arrow[bot] (via GitHub)" <gi...@apache.org>.
conbench-apache-arrow[bot] commented on PR #36122:
URL: https://github.com/apache/arrow/pull/36122#issuecomment-1596270279

   Conbench analyzed the 6 benchmark runs on commit `14f2e4e3`.
   
   There were 2 benchmark results indicating a performance regression:
   
   - Commit Run on `ursa-thinkcentre-m75q` at [2023-06-17 22:04:51Z](http://conbench.ursa.dev/compare/runs/a43900293a124814bece09733a2a562a...adc39d9864d4422696752e4d81afbdd3/)
     - [params=32768/1, source=cpp-micro, suite=arrow-bit-util-benchmark](http://conbench.ursa.dev/compare/benchmarks/0648e06ea6c67f1c80007714ab737255...0648e2e21fd8778a8000dacf1f1c93f8)
     - [params=<SMALL_VECTOR(int)>, source=cpp-micro, suite=arrow-small-vector-benchmark](http://conbench.ursa.dev/compare/benchmarks/0648e070a5337f708000e7ed522ce0af...0648e2e4173377f980001d93a6cfdc19)
   
   The [full Conbench report](https://github.com/apache/arrow/runs/14357486378) has more details.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] wjones127 commented on a diff in pull request #36122: GH-36120: [C#] Support schema metadata through the C API

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #36122:
URL: https://github.com/apache/arrow/pull/36122#discussion_r1232610389


##########
csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs:
##########
@@ -281,21 +283,53 @@ public Field GetAsField()
 
                 bool nullable = _cSchema->GetFlag(CArrowSchema.ArrowFlagNullable);
 
-                return new Field(fieldName, GetAsType(), nullable);
+                return new Field(fieldName, GetAsType(), nullable, GetMetadata(_cSchema->metadata));
             }
 
             public Schema GetAsSchema()
             {
                 ArrowType fullType = GetAsType();
                 if (fullType is StructType structType)
                 {
-                    return new Schema(structType.Fields, default);
+                    return new Schema(structType.Fields, GetMetadata(_cSchema->metadata));
                 }
                 else
                 {
                     throw new ArgumentException("Imported type is not a struct type, so it cannot be converted to a schema.");
                 }
             }
+
+            private unsafe static IReadOnlyDictionary<string, string> GetMetadata(byte* metadata)
+            {
+                if (metadata == null)
+                {
+                    return null;
+                }
+
+                IntPtr ptr = (IntPtr)metadata;
+                int count = Marshal.ReadInt32(ptr);
+                if (count == 0)
+                {
+                    return null;
+                }
+                ptr += 4;
+
+                Dictionary<string, string> result = new Dictionary<string, string>(count);
+                for (int i = 0; i < count; i++)
+                {
+                    result[ReadString(ref ptr)] = ReadString(ref ptr);
+                }
+                return result;
+            }
+
+            private unsafe static string ReadString(ref IntPtr ptr)

Review Comment:
   Might want to name more specifically here:
   
   ```suggestion
               private unsafe static string ReadMetadataString(ref IntPtr ptr)
   ```



##########
csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs:
##########
@@ -281,21 +283,53 @@ public Field GetAsField()
 
                 bool nullable = _cSchema->GetFlag(CArrowSchema.ArrowFlagNullable);
 
-                return new Field(fieldName, GetAsType(), nullable);
+                return new Field(fieldName, GetAsType(), nullable, GetMetadata(_cSchema->metadata));
             }
 
             public Schema GetAsSchema()
             {
                 ArrowType fullType = GetAsType();
                 if (fullType is StructType structType)
                 {
-                    return new Schema(structType.Fields, default);
+                    return new Schema(structType.Fields, GetMetadata(_cSchema->metadata));
                 }
                 else
                 {
                     throw new ArgumentException("Imported type is not a struct type, so it cannot be converted to a schema.");
                 }
             }
+
+            private unsafe static IReadOnlyDictionary<string, string> GetMetadata(byte* metadata)
+            {
+                if (metadata == null)
+                {
+                    return null;
+                }
+
+                IntPtr ptr = (IntPtr)metadata;
+                int count = Marshal.ReadInt32(ptr);
+                if (count == 0)

Review Comment:
   Perhaps, to be somewhat defensive:
   
   ```suggestion
                   if (count <= 0)
   ```



##########
csharp/src/Apache.Arrow/C/CArrowSchemaImporter.cs:
##########
@@ -281,21 +283,53 @@ public Field GetAsField()
 
                 bool nullable = _cSchema->GetFlag(CArrowSchema.ArrowFlagNullable);
 
-                return new Field(fieldName, GetAsType(), nullable);
+                return new Field(fieldName, GetAsType(), nullable, GetMetadata(_cSchema->metadata));
             }
 
             public Schema GetAsSchema()
             {
                 ArrowType fullType = GetAsType();
                 if (fullType is StructType structType)
                 {
-                    return new Schema(structType.Fields, default);
+                    return new Schema(structType.Fields, GetMetadata(_cSchema->metadata));
                 }
                 else
                 {
                     throw new ArgumentException("Imported type is not a struct type, so it cannot be converted to a schema.");
                 }
             }
+
+            private unsafe static IReadOnlyDictionary<string, string> GetMetadata(byte* metadata)
+            {
+                if (metadata == null)
+                {
+                    return null;
+                }
+
+                IntPtr ptr = (IntPtr)metadata;
+                int count = Marshal.ReadInt32(ptr);
+                if (count == 0)
+                {
+                    return null;
+                }
+                ptr += 4;
+
+                Dictionary<string, string> result = new Dictionary<string, string>(count);
+                for (int i = 0; i < count; i++)
+                {
+                    result[ReadString(ref ptr)] = ReadString(ref ptr);
+                }
+                return result;
+            }
+
+            private unsafe static string ReadString(ref IntPtr ptr)
+            {
+                int length = Marshal.ReadInt32(ptr);

Review Comment:
   Perhaps we should also throw an exception if length is negative? Or will `GetString` already do that?



##########
csharp/test/Apache.Arrow.Tests/CDataInterfacePythonTests.cs:
##########


Review Comment:
   Could you add a test that involves some multi-byte unicode characters?



##########
csharp/src/Apache.Arrow/C/CArrowSchemaExporter.cs:
##########
@@ -239,6 +240,51 @@ private static long GetFlags(IArrowType datatype, bool nullable = true)
             }
         }
 
+        private unsafe static byte* ConstructMetadata(IReadOnlyDictionary<string, string> metadata)
+        {
+            if (metadata == null || metadata.Count == 0)
+            {
+                return null;
+            }
+
+            int size = 4;
+            int[] lengths = new int[metadata.Count * 2];
+            int i = 0;
+            foreach (KeyValuePair<string, string> pair in metadata)
+            {
+                size += 8;
+                lengths[i] = Encoding.UTF8.GetByteCount(pair.Key);
+                size += lengths[i++];
+                lengths[i] = Encoding.UTF8.GetByteCount(pair.Value);
+                size += lengths[i++];
+            }
+
+            IntPtr result = Marshal.AllocHGlobal(size);
+            Marshal.WriteInt32(result, metadata.Count);
+            byte* ptr = (byte*)result + 4;
+            i = 0;
+            foreach (KeyValuePair<string, string> pair in metadata)
+            {
+                WriteString(ref ptr, lengths[i++], pair.Key);
+                WriteString(ref ptr, lengths[i++], pair.Value);
+            }
+
+            Debug.Assert((long)(IntPtr)ptr - (long)result == size);
+
+            return (byte*)result;
+        }
+
+        private unsafe static void WriteString(ref byte* ptr, int length, string str)

Review Comment:
   ```suggestion
           private unsafe static void WriteMetadataString(ref byte* ptr, int length, string str)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org