You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/05 00:12:12 UTC

[GitHub] [spark] SandishKumarHN opened a new pull request, #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

SandishKumarHN opened a new pull request, #38515:
URL: https://github.com/apache/spark/pull/38515

   
   ### What changes were proposed in this pull request?
   null check for data generator after type conversion
   
   NA
   ### Why are the changes needed?
   NA
   
   
   ### Does this PR introduce _any_ user-facing change?
   NA
   
   ### How was this patch tested?
   I have tested all the random manually
   current unit tests
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1306058916

   @MaxGekk, please merge once this this looks good to you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014774789


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##########
@@ -3344,7 +3344,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase {
   def failedParsingDescriptorError(descFilePath: String, cause: Throwable): Throwable = {
     new AnalysisException(
       errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR",

Review Comment:
   Could you add a test for this error class.



##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala:
##########
@@ -177,6 +177,19 @@ class ProtobufSerdeSuite extends SharedSparkSession {
     withFieldMatchType(Deserializer.create(CATALYST_STRUCT, protoNestedFile, _))
   }
 
+  test("raise cannot parse protobuf descriptor error") {
+    // passing serde_suite.proto instead serde_suite.desc
+    val testFileDesc = testFile("serde_suite.proto").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot")
+    }
+
+    checkError(
+      exception = e,
+      errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",

Review Comment:
   @SandishKumarHN Thank you for the added test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014688948


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##########
@@ -3330,7 +3330,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase {
   def descrioptorParseError(descFilePath: String, cause: Throwable): Throwable = {
     new AnalysisException(
       errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",
-      messageParameters = Map.empty("descFilePath" -> descFilePath),
+      messageParameters = Map("descFilePath" -> descFilePath),

Review Comment:
   @MaxGekk fixing this one https://github.com/apache/spark/pull/38344#discussion_r1014644624 here  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator 
URL: https://github.com/apache/spark/pull/38515


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014660407


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -123,16 +123,21 @@ class ProtobufCatalystDataConversionSuite
     StringType -> ("StringMsg", ""))
 
   testingTypes.foreach { dt =>
-    val seed = 1 + scala.util.Random.nextInt((1024 - 1) + 1)
+    val seed = scala.util.Random.nextInt(RandomDataGenerator.MAX_STR_LEN)
     test(s"single $dt with seed $seed") {
 
       val (messageName, defaultValue) = catalystTypesToProtoMessages(dt.fields(0).dataType)
 
       val rand = new scala.util.Random(seed)
       val generator = RandomDataGenerator.forType(dt, rand = rand).get
       var data = generator()
-      while (data.asInstanceOf[Row].get(0) == defaultValue) // Do not use default values, since
-        data = generator()                                  // from_protobuf() returns null in v3.
+      // Do not use default values, since from_protobuf() returns null in v3.
+      while (
+        data != null &&
+        (data.asInstanceOf[Row].get(0) == defaultValue ||
+          (data.asInstanceOf[Row].get(0).isInstanceOf[Array[Byte]] &&
+          data.asInstanceOf[Row].get(0).asInstanceOf[Array[Byte]].isEmpty)))

Review Comment:
   Isn't `ByteString.empty().toByteArray` at line 122 already empty value?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1306148220

   @SandishKumarHN @rangadi I will review and merge it (if everything is ok, and test passed) tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1304351416

   @rangadi Because some random numbers do not convert to catalyst type, a null check for the data generator is required.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014668826


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -123,16 +123,21 @@ class ProtobufCatalystDataConversionSuite
     StringType -> ("StringMsg", ""))
 
   testingTypes.foreach { dt =>
-    val seed = 1 + scala.util.Random.nextInt((1024 - 1) + 1)
+    val seed = scala.util.Random.nextInt(RandomDataGenerator.MAX_STR_LEN)
     test(s"single $dt with seed $seed") {
 
       val (messageName, defaultValue) = catalystTypesToProtoMessages(dt.fields(0).dataType)
 
       val rand = new scala.util.Random(seed)
       val generator = RandomDataGenerator.forType(dt, rand = rand).get
       var data = generator()
-      while (data.asInstanceOf[Row].get(0) == defaultValue) // Do not use default values, since
-        data = generator()                                  // from_protobuf() returns null in v3.
+      // Do not use default values, since from_protobuf() returns null in v3.
+      while (
+        data != null &&

Review Comment:
   This should just be `data == null || data.asInstanceOf[Row].get(0) == defaultValue`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014891059


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryCompilationErrors.scala:
##########
@@ -3344,7 +3344,7 @@ private[sql] object QueryCompilationErrors extends QueryErrorsBase {
   def failedParsingDescriptorError(descFilePath: String, cause: Throwable): Throwable = {
     new AnalysisException(
       errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR",

Review Comment:
   @MaxGekk added 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015788643


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -677,4 +677,34 @@ class ProtobufFunctionsSuite extends QueryTest with SharedSparkSession with Seri
           === inputDf.select("durationMsg.duration").take(1).toSeq(0).get(0))
     }
   }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val basicMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "BasicMessage")
+
+    val basicMessage = DynamicMessage

Review Comment:
   correct, fixed. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015782259


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##########
@@ -677,4 +677,34 @@ class ProtobufFunctionsSuite extends QueryTest with SharedSparkSession with Seri
           === inputDf.select("durationMsg.duration").take(1).toSeq(0).get(0))
     }
   }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val basicMessageDesc = ProtobufUtils.buildDescriptor(testFileDesc, "BasicMessage")
+
+    val basicMessage = DynamicMessage

Review Comment:
   Btw, you don't need the message. Could use empty byte array while initializing `df`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1304540917

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1307510678

   @MaxGekk can also close the [JIRA](https://issues.apache.org/jira/browse/SPARK-41015)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on PR #38515:
URL: https://github.com/apache/spark/pull/38515#issuecomment-1306969204

   +1, LGTM. Merging to master.
   Thank you, @SandishKumarHN and @rangadi for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015745863


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala:
##########
@@ -177,6 +177,31 @@ class ProtobufSerdeSuite extends SharedSparkSession {
     withFieldMatchType(Deserializer.create(CATALYST_STRUCT, protoNestedFile, _))
   }
 
+  test("raise cannot parse protobuf descriptor error") {
+    // passing serde_suite.proto instead serde_suite.desc
+    val testFileDesc = testFile("serde_suite.proto").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot")
+    }
+
+    checkError(
+      exception = e,
+      errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",
+      parameters = Map("descFilePath" -> testFileDesc))
+  }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val testFileDesc = testFile("basicmessage_noimports.desc").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.parseFileDescriptorSet(testFileDesc)

Review Comment:
   @rangadi  Yes, but the query would catch this error and throw a different error right? 



##########
connector/protobuf/src/test/resources/protobuf/basicmessage_noimports.desc:
##########
@@ -0,0 +1,18 @@
+

Review Comment:
   @rangadi yes, it is used in the unit test above, how else can we raise the CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR exception? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015742508


##########
connector/protobuf/src/test/resources/protobuf/basicmessage_noimports.desc:
##########
@@ -0,0 +1,18 @@
+

Review Comment:
   Where is this used?
   Did you mean to another test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014676031


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -123,16 +123,21 @@ class ProtobufCatalystDataConversionSuite
     StringType -> ("StringMsg", ""))
 
   testingTypes.foreach { dt =>
-    val seed = 1 + scala.util.Random.nextInt((1024 - 1) + 1)
+    val seed = scala.util.Random.nextInt(RandomDataGenerator.MAX_STR_LEN)
     test(s"single $dt with seed $seed") {
 
       val (messageName, defaultValue) = catalystTypesToProtoMessages(dt.fields(0).dataType)
 
       val rand = new scala.util.Random(seed)
       val generator = RandomDataGenerator.forType(dt, rand = rand).get
       var data = generator()
-      while (data.asInstanceOf[Row].get(0) == defaultValue) // Do not use default values, since
-        data = generator()                                  // from_protobuf() returns null in v3.
+      // Do not use default values, since from_protobuf() returns null in v3.
+      while (
+        data != null &&

Review Comment:
   I meant, we don't need the array check.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1014688453


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -123,16 +123,21 @@ class ProtobufCatalystDataConversionSuite
     StringType -> ("StringMsg", ""))
 
   testingTypes.foreach { dt =>
-    val seed = 1 + scala.util.Random.nextInt((1024 - 1) + 1)
+    val seed = scala.util.Random.nextInt(RandomDataGenerator.MAX_STR_LEN)
     test(s"single $dt with seed $seed") {
 
       val (messageName, defaultValue) = catalystTypesToProtoMessages(dt.fields(0).dataType)
 
       val rand = new scala.util.Random(seed)
       val generator = RandomDataGenerator.forType(dt, rand = rand).get
       var data = generator()
-      while (data.asInstanceOf[Row].get(0) == defaultValue) // Do not use default values, since
-        data = generator()                                  // from_protobuf() returns null in v3.
+      // Do not use default values, since from_protobuf() returns null in v3.
+      while (
+        data != null &&

Review Comment:
   @rangadi 
   1. `data.asInstanceOf[Row].get(0) == ByteString.empty().toByteArray`
   2. `data.asInstanceOf[Row].get(0) == Array.emptyByteArray`
   3. `data.asInstanceOf[Row].get(0) ==  ByteString.EMPTY.toByteArray`
   4. `data.asInstanceOf[Row].get(0) ==   "".getBytes`
   5. `data.asInstanceOf[Row].get(0).isInstanceOf[Array[Byte]] && data.asInstanceOf[Row].get(0).asInstanceOf[Array[Byte]].isEmpty`
   
   Except for (5), none of them worked. I'm printing under the conditions listed above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015740385


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala:
##########
@@ -177,6 +177,31 @@ class ProtobufSerdeSuite extends SharedSparkSession {
     withFieldMatchType(Deserializer.create(CATALYST_STRUCT, protoNestedFile, _))
   }
 
+  test("raise cannot parse protobuf descriptor error") {
+    // passing serde_suite.proto instead serde_suite.desc
+    val testFileDesc = testFile("serde_suite.proto").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot")
+    }
+
+    checkError(
+      exception = e,
+      errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",
+      parameters = Map("descFilePath" -> testFileDesc))
+  }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val testFileDesc = testFile("basicmessage_noimports.desc").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.parseFileDescriptorSet(testFileDesc)

Review Comment:
   Thanks for adding the test. 
   Do we need to invoke this? Any query would raise this, right?



##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -123,16 +123,21 @@ class ProtobufCatalystDataConversionSuite
     StringType -> ("StringMsg", ""))
 
   testingTypes.foreach { dt =>
-    val seed = 1 + scala.util.Random.nextInt((1024 - 1) + 1)
+    val seed = scala.util.Random.nextInt(RandomDataGenerator.MAX_STR_LEN)
     test(s"single $dt with seed $seed") {
 
       val (messageName, defaultValue) = catalystTypesToProtoMessages(dt.fields(0).dataType)
 
       val rand = new scala.util.Random(seed)
       val generator = RandomDataGenerator.forType(dt, rand = rand).get
       var data = generator()
-      while (data.asInstanceOf[Row].get(0) == defaultValue) // Do not use default values, since
-        data = generator()                                  // from_protobuf() returns null in v3.
+      // Do not use default values, since from_protobuf() returns null in v3.
+      while (
+        data != null &&

Review Comment:
   I see. Equals does not check the content on array.
   Optional: 
   We could recduce `data.asInstanceOf` calls with `val  data = generator().asInstanceOf[Row]`.
   Also could replace 
   > `data.asInstanceOf[Row].get(0).isInstanceOf[Array[Byte]]`
   with
   > `dt ==  BinaryType` 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] rangadi commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
rangadi commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015760643


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala:
##########
@@ -177,6 +177,31 @@ class ProtobufSerdeSuite extends SharedSparkSession {
     withFieldMatchType(Deserializer.create(CATALYST_STRUCT, protoNestedFile, _))
   }
 
+  test("raise cannot parse protobuf descriptor error") {
+    // passing serde_suite.proto instead serde_suite.desc
+    val testFileDesc = testFile("serde_suite.proto").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot")
+    }
+
+    checkError(
+      exception = e,
+      errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",
+      parameters = Map("descFilePath" -> testFileDesc))
+  }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val testFileDesc = testFile("basicmessage_noimports.desc").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.parseFileDescriptorSet(testFileDesc)

Review Comment:
   Users don't call this. Analysis exception thrown to the user if they try to use this.



##########
connector/protobuf/src/test/resources/protobuf/basicmessage_noimports.desc:
##########
@@ -0,0 +1,18 @@
+

Review Comment:
   Right. I didn't see the second test carefully. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38515: [SPARK-41015][SQL][PROTOBUF] UnitTest null check for data generator

Posted by GitBox <gi...@apache.org>.
SandishKumarHN commented on code in PR #38515:
URL: https://github.com/apache/spark/pull/38515#discussion_r1015781498


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufSerdeSuite.scala:
##########
@@ -177,6 +177,31 @@ class ProtobufSerdeSuite extends SharedSparkSession {
     withFieldMatchType(Deserializer.create(CATALYST_STRUCT, protoNestedFile, _))
   }
 
+  test("raise cannot parse protobuf descriptor error") {
+    // passing serde_suite.proto instead serde_suite.desc
+    val testFileDesc = testFile("serde_suite.proto").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.buildDescriptor(testFileDesc, "FieldMissingInSQLRoot")
+    }
+
+    checkError(
+      exception = e,
+      errorClass = "CANNOT_PARSE_PROTOBUF_DESCRIPTOR",
+      parameters = Map("descFilePath" -> testFileDesc))
+  }
+
+  test("raise cannot construct protobuf descriptor error") {
+    val testFileDesc = testFile("basicmessage_noimports.desc").replace("file:/", "/")
+    val e = intercept[AnalysisException] {
+      ProtobufUtils.parseFileDescriptorSet(testFileDesc)

Review Comment:
   @rangadi Check it out once again I've made a few minor changes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org