You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "tradercentric (via GitHub)" <gi...@apache.org> on 2023/09/25 05:06:56 UTC

[GitHub] [avro] tradercentric opened a new pull request, #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to support up to 64 level depth in Avro schema

tradercentric opened a new pull request, #2519:
URL: https://github.com/apache/avro/pull/2519

   <!--
   
   *Thank you very much for contributing to Apache Avro - we are happy that you want to help us improve Avro. To help the community review your contribution in the best possible way, please go through the checklist below, which will get the contribution into a shape in which it can be best reviewed.*
   
   *Please understand that we do not do this to make contributions to Avro a hassle. In order to uphold a high standard of quality for code contributions, while at the same time managing a large number of contributions, we need contributors to prepare the contributions well, and give reviewers enough contextual information for the review. Please also understand that contributions that do not follow this guide will take longer to review and thus typically be picked up with lower priority by the community.*
   
   ## Contribution Checklist
   
     - Make sure that the pull request corresponds to a [JIRA issue](https://issues.apache.org/jira/projects/AVRO/issues). Exceptions are made for typos in JavaDoc or documentation files, which need no JIRA issue.
     
     - Name the pull request in the form "AVRO-XXXX: [component] Title of the pull request", where *AVRO-XXXX* should be replaced by the actual issue number. 
       The *component* is optional, but can help identify the correct reviewers faster: either the language ("java", "python") or subsystem such as "build" or "doc" are good candidates.  
   
     - Fill out the template below to describe the changes contributed by the pull request. That will give reviewers the context they need to do the review.
     
     - Make sure that the change passes the automated tests. You can [build the entire project](https://github.com/apache/avro/blob/master/BUILD.md) or just the [language-specific SDK](https://avro.apache.org/project/how-to-contribute/#unit-tests).
   
     - Each pull request should address only one issue, not mix up code from multiple issues.
     
     - Each commit in the pull request has a meaningful commit message (including the JIRA id)
   
     - Every commit message references Jira issues in their subject lines. In addition, commits follow the guidelines from [How to write a good git commit message](https://chris.beams.io/posts/git-commit/)
       1. Subject is separated from body by a blank line
       1. Subject is limited to 50 characters (not including Jira issue reference)
       1. Subject does not end with a period
       1. Subject uses the imperative mood ("add", not "adding")
       1. Body wraps at 72 characters
       1. Body explains "what" and "why", not "how"
   
   -->
   
   ## What is the purpose of the change
   
   I am an end-user/developer to use Confluent Kafka to consume Avro message seems to be over 32 depth level.  The producer/vendor cannot reduce the depth level of the Avro message.    Confluent support has assisted me to open an issue with Apache Avro project.  The issue is logged in https://issues.apache.org/jira/browse/AVRO-3856.    I am in need of code fix so I can continue.
   
   The current code in Schema.cs parsing Avro schema using JObject.Parse(json), JArray.Parse(json) do not have option to override max depth level.  On https://github.com/JamesNK/Newtonsoft.Json/pull/2904, Newtonsoft Author advised to use JObject.Load(Jsonreader), so we can set the MaxDepth in JsonReader once instantiated.   In doing so, I also discovered JsonReader is over-counting the depths in Avro schema.  A 32 level depth Avro schema can be counted as 92 depth levels in JsonReader.Load (calling the Push method).
   
   Here is my observation of over-counting in Jsonreader:
   
   Avro Schema Depth | JsonReader Depth Count
   4 | 11
   16 | 44
   32 | 92
   64 | 188
   
   So to compensate, I had hardcoded JsonReader MaxDepth to 192 to support parsing of Avro Schema with 64 level depth.
   
   ## Verifying this change
   
   This change added tests and can be verified as follows:
   
   I added 3 test cases in SchemaTests.cs:
   
   Parse16DepthLevelSchemaTest
   Parse32DepthLevelSchemaTest
   Parse64DepthLevelSchemaTest
   
   To verify the fixes.
   
   ## Documentation
   
   - Does this pull request introduce a new feature? no
   - If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336121139


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Corrected.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336411962


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   I do not think I am qualified to make those changes to allow application developer to customize the limit.  I would like an enhancement from advance developers.   I did more analysis on the real world Avro schema, and I found out it is just 1 above the 64 max depth level defaulted in JsonReader.   I added test cases of the real world Avro schema I have since the information there is not sensitive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336411962


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   I do not think I am qualified to make those changes to allow application developer to customize the limit.  I would like an enhancement.   I did more analysis on the real world Avro schema, and found out it is just 1 above the 64 max depth level defaulted in JsonReader.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336121139


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Corrected in my fork.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on pull request #2519: AVRO-3856: [C#] Avro schema nested structures limit is around 20 which is not enough in some cases

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on PR #2519:
URL: https://github.com/apache/avro/pull/2519#issuecomment-1739483075

   I noticed there is no deserialization error in Java Avro library with the same schema.
   May I ask whose Confluent developers qualified  in improving Apache.Avro C# library can lead the effort to allow customization of max depth please? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on pull request #2519: AVRO-3856: [C#] Avro schema nested structures limit is around 20 which is not enough in some cases

Posted by "KalleOlaviNiemitalo (via GitHub)" <gi...@apache.org>.
KalleOlaviNiemitalo commented on PR #2519:
URL: https://github.com/apache/avro/pull/2519#issuecomment-1737825194

   In my opinion, the code changes here are now OK to merge, but AVRO-3856 should either
   
   - be left open until applications can actually customize the max depth, or
   - be reworded to request only a larger constant max depth rather than the ability to customize. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336411962


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   I do not think I am qualified to make those changes to allow application developer to customize the limit.  I would like an enhancement from advance developers.   I did more analysis on the real world Avro schema, and I found out it is just 1 above the 64 max depth level defaulted in JsonReader.   I added test cases of the real world Avro schema I have since the information is not sensitive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336423537


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   Please help to review.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : (JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);

Review Comment:
   Please help to review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on pull request #2519: AVRO-3856: [C#] Hardcode max depth to 192 when using Newtonsoft Jsonreader to parse Avro schema up to 64 depth level

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on PR #2519:
URL: https://github.com/apache/avro/pull/2519#issuecomment-1739555495

   Hi @KalleOlaviNiemitalo , I have reworded the title to reflect the wish to hardcode value of 192 to support parsing of Avro schema up to 64 depth levels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336423551


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : (JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);

Review Comment:
   Please help to review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336423537


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   Please help to review.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336121904


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.

Review Comment:
   I reworded the comment first and committed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "KalleOlaviNiemitalo (via GitHub)" <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1335403993


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.

Review Comment:
   Please reword this not to give the impression that the Newtonsoft.Json library has a bug that makes it count the depth incorrectly.  The difference in depth counts is rather caused by how the Avro schemas are represented in JSON; each nested Avro schema requires multiple nested JSON containers.  The Newtonsoft.Json library is not specific to Avro and is not designed to count Avro schemas, so the behaviour seems correct to me.
   
   The Avro schema in <https://github.com/JamesNK/Newtonsoft.Json/pull/2904#issuecomment-1732764055> has 4 levels of record schemas, but its JSON representation has 12 levels of nested containers: 
   
   ```JSON
   { /* depth 1: object */
     "type": "record",
     "name": "Level1",
     "fields": [ /* depth 2: array */
       {
         "name": "field1",
         "type": "string"
       },
       {
         "name": "field2",
         "type": "int"
       },
       { /* depth 3: object */
         "name": "level2",
         "type": { /* depth 4: object */
           "type": "record",
           "name": "Level2",
           "fields": [ /* depth 5: array */
             {
               "name": "field3",
               "type": "boolean"
             },
             {
               "name": "field4",
               "type": "double"
             },
             { /* depth 6: object */
               "name": "level3",
               "type": { /* depth 7: object */
                 "type": "record",
                 "name": "Level3",
                 "fields": [ /* depth 8: array */
                   {
                     "name": "field5",
                     "type": "string"
                   },
                   {
                     "name": "field6",
                     "type": "int"
                   },
                   { /* depth 9: object */
                     "name": "level4",
                     "type": { /* depth 10: object */
                       "type": "record",
                       "name": "Level4",
                       "fields": [ /* depth 11: array */
                         { /* depth 12: object */
                           "name": "field7",
                           "type": "boolean"
                         },
                         {
                           "name": "field8",
                           "type": "double"
                         }
                       ]
                     }
                   }
                 ]
               }
             }
           ]
         }
       }
     ]
   }
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336121049


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : (JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);

Review Comment:
   Ack.  In progress to look into it



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Ack.  In progress to look into it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on pull request #2519: AVRO-3856: [C#] Avro schema nested structures limit is around 20 which is not enough in some cases

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on PR #2519:
URL: https://github.com/apache/avro/pull/2519#issuecomment-1736716956

   Added comment of a workaround on end user/developer side in https://issues.apache.org/jira/browse/AVRO-3856.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336406709


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : (JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);

Review Comment:
   I did not add the tests because I can follow exactly what JObject.Parse has done after JObject.Load:
   
   try
   {
       bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
           && json.EndsWith("]", StringComparison.Ordinal);
       JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);
   
       // When replacing JArray.Parse and JObject.Parse with JArray.Load and JObject.Load,
       // we will need this check following what Newtonsoft.Json does.
       while (reader.Read())
       {
           // Any content encountered here other than a comment will throw in the reader.
       }
   :
   :
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] tradercentric commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "tradercentric (via GitHub)" <gi...@apache.org>.
tradercentric commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1336411962


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   I do not think I am qualified to make those changes to allow application developer to customize the limit.  I would like an enhancement from other advance developers.   I did more analysis on the real world Avro schema, and I found out it is just 1 above the 64 max depth level defaulted in JsonReader.   I added test cases of the real world Avro schema I have since the information is not sensitive.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] KalleOlaviNiemitalo commented on a diff in pull request #2519: AVRO-3856: [C#] Fixing Newtonsoft usage in Schema.cs to parse up to 64 level depth in Avro schema

Posted by "KalleOlaviNiemitalo (via GitHub)" <gi...@apache.org>.
KalleOlaviNiemitalo commented on code in PR #2519:
URL: https://github.com/apache/avro/pull/2519#discussion_r1335422416


##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;

Review Comment:
   Not sure that this actually fixes [AVRO-3856](https://issues.apache.org/jira/browse/AVRO-3856) as stated.  Although this change allows an Avro schema with 64 nested record schemas, application developers still cannot customize the limit.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.

Review Comment:
   Please reword this not to give the impression that the Newtonsoft.Json library has a bug that makes it count the depth incorrectly.  The difference in depth counts is rather caused by how the Avro schemas are represented in JSON; each nested Avro schema requires multiple nested JSON containers.  The Newtonsoft.Json library is not specific to Avro and is not designed to count Avro schemas, so the behaviour seems correct to me.
   
   The Avro schema in <https://github.com/JamesNK/Newtonsoft.Json/pull/2904#issuecomment-1732764055> has 4 levels of record schemas, but its JSON representation has 11 levels of nested containers: 
   
   ```JSON
   { /* depth 1: object */
     "type": "record",
     "name": "Level1",
     "fields": [ /* depth 2: array */
       {
         "name": "field1",
         "type": "string"
       },
       {
         "name": "field2",
         "type": "int"
       },
       { /* depth 3: object */
         "name": "level2",
         "type": { /* depth 4: object */
           "type": "record",
           "name": "Level2",
           "fields": [ /* depth 5: array */
             {
               "name": "field3",
               "type": "boolean"
             },
             {
               "name": "field4",
               "type": "double"
             },
             { /* depth 6: object */
               "name": "level3",
               "type": { /* depth 7: object */
                 "type": "record",
                 "name": "Level3",
                 "fields": [ /* depth 8: array */
                   {
                     "name": "field5",
                     "type": "string"
                   },
                   {
                     "name": "field6",
                     "type": "int"
                   },
                   { /* depth 9: object */
                     "name": "level4",
                     "type": {
                       "type": "record",
                       "name": "Level4",
                       "fields": [ /* depth 10: array */
                         { /* depth 11: object */
                           "name": "field7",
                           "type": "boolean"
                         },
                         {
                           "name": "field8",
                           "type": "double"
                         }
                       ]
                     }
                   }
                 ]
               }
             }
           ]
         }
       }
     ]
   }
   ```
   



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));
+            // Another issue discovered is JsonReader.Push(JsonContainerType value) method overcounting the depth
+            // level of Avro schema.  Here are the observation of over-counting depth level in Newtonsoft's JsonReader:
+            // Avro Schema Depth	JsonReader Depth Level Count
+            // 4	                11
+            // 16                   44
+            // 32	                92
+            // 64	                188
+            // So, roughly speaking, the depth level count is about 2.75 times of Avro schema depth.
+            // Below is the hard-coded value to compensate over-counting of depth level in Newtonsoft
+            // to support Avro schema depth level to 64 slightly beyond.
+            reader.MaxDepth = 192;
+
             try
             {
                 bool IsArray = json.StartsWith("[", StringComparison.Ordinal)
                     && json.EndsWith("]", StringComparison.Ordinal);
-                JContainer j = IsArray ? (JContainer)JArray.Parse(json) : (JContainer)JObject.Parse(json);
+                JContainer j = IsArray ? (JContainer)JArray.Load(reader) : (JContainer)JObject.Load(reader);

Review Comment:
   After JObject.Parse has called JObject.Load, it checks that the object in the JSON input is not followed by anything else.  Now when Avro calls JObject.Load directly, that check no longer happens.  Please add a test that verifies Schema.Parse will throw an exception if given invalid JSON that contains more than one JSON object, something like this:
   
   ```JSON
   {
       "type": "int"
   }
   {
       "type": "string",
       "doc": "This is invalid because the schema must not be followed by other JSON objects."
   }
   ```
   
   And likewise for invalid JSON that contains more than one JSON array.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the reader.  Although it does not have much effect now, it will become more important if an IArrayPool\<char\> is added later.



##########
lang/csharp/src/apache/main/Schema/Schema.cs:
##########
@@ -243,11 +244,28 @@ internal static Schema Parse(string json, SchemaNames names, string encspace)
             Schema sc = PrimitiveSchema.NewInstance(json);
             if (null != sc) return sc;
 
+            // Refer to https://issues.apache.org/jira/browse/AVRO-3856
+            // Refer to https://github.com/JamesNK/Newtonsoft.Json/pull/2904
+            // Newtonsoft author advised to use JObject.Load/JArray.Load instead of JObject.Parse()/JArray.Parse()
+            // The reason is we can set the MaxDepth property on the JsonReader.
+            JsonReader reader = new JsonTextReader(new StringReader(json));

Review Comment:
   Please add <code>using (<var>…</var>) { <var>…</var> }</code> to close the reader.  Although it does not have much effect now, it will become more important if an IArrayPool\<char\> is added later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [avro] emasab commented on pull request #2519: AVRO-3856: [C#] Avro schema nested structures limit is around 20 which is not enough in some cases

Posted by "emasab (via GitHub)" <gi...@apache.org>.
emasab commented on PR #2519:
URL: https://github.com/apache/avro/pull/2519#issuecomment-1738673060

   Thanks @tradercentric @KalleOlaviNiemitalo I've updated the issue so it corresponds to the fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@avro.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org