You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "rangadi (via GitHub)" <gi...@apache.org> on 2023/02/14 08:41:28 UTC

[GitHub] [spark] rangadi commented on a diff in pull request #40011: [SPARK-42406][PROTOBUF] Fix recursive depth setting for Protobuf functions

rangadi commented on code in PR #40011:
URL: https://github.com/apache/spark/pull/40011#discussion_r1105466945


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -385,34 +387,12 @@ class ProtobufFunctionsSuite extends QueryTest with SharedSparkSession with Prot
     }
   }
 
-  test("Handle recursive fields in Protobuf schema, A->B->A") {
-    val schemaA = ProtobufUtils.buildDescriptor(testFileDesc, "recursiveA")

Review Comment:
   This extra code to create actual records is not required since we are testing an AnalysisException. It does not need real data.



##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##########
@@ -118,17 +118,17 @@ object SchemaConverters {
         // discarded.
         // SQL Schema for the protobuf message `message Person { string name = 1; Person bff = 2}`
         // will vary based on the value of "recursive.fields.max.depth".
-        // 0: struct<name: string, bff: null>
-        // 1: struct<name string, bff: <name: string, bff: null>>
-        // 2: struct<name string, bff: <name: string, bff: struct<name: string, bff: null>>> ...
+        // 1: struct<name: string, bff: null>
+        // 2: struct<name string, bff: <name: string, bff: null>>
+        // 3: struct<name string, bff: <name: string, bff: struct<name: string, bff: null>>> ...
         val recordName = fd.getMessageType.getFullName
         val recursiveDepth = existingRecordNames.getOrElse(recordName, 0)
         val recursiveFieldMaxDepth = protobufOptions.recursiveFieldMaxDepth
-        if (existingRecordNames.contains(recordName) && (recursiveFieldMaxDepth < 0 ||
+        if (existingRecordNames.contains(recordName) && (recursiveFieldMaxDepth <= 0 ||
           recursiveFieldMaxDepth > 10)) {
           throw QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString())
         } else if (existingRecordNames.contains(recordName) &&
-          recursiveDepth > recursiveFieldMaxDepth) {
+          recursiveDepth >= recursiveFieldMaxDepth) {

Review Comment:
   The main bug fix: 
   We allowed recursive depth go over by one. This makes the setting of '0' does not work as documented (earlier documentation stated even the first occurrence would be removed it it was set to 0). 
   Removing the first occurrence does not make sense since we don't know if this field is recursive until the second occurrence. 



##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -853,7 +824,7 @@ class ProtobufFunctionsSuite extends QueryTest with SharedSparkSession with Prot
 
     val df = Seq(employee.toByteArray).toDF("protoEvent")
     val options = new java.util.HashMap[String, String]()
-    options.put("recursive.fields.max.depth", "1")
+    options.put("recursive.fields.max.depth", "2")

Review Comment:
   We need to increase the setting because this PR fixes off by one with enforcement. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org