You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "panbingkun (via GitHub)" <gi...@apache.org> on 2023/10/24 02:17:54 UTC

[PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

panbingkun opened a new pull request, #43493:
URL: https://github.com/apache/spark/pull/43493

   ### What changes were proposed in this pull request?
   The pr aims to fix flaky ProtobufCatalystDataConversionSuite.
   
   As can be seen from the GA test below
   https://github.com/panbingkun/spark/actions/runs/6612141762/job/17982780917
   <img width="988" alt="image" src="https://github.com/apache/spark/assets/15246973/fd1931ba-d858-4c59-9d5c-3a1353e3e005">
   
   When `data.get(0)` is null, `data.get(0).asInstanceOf[Array[Byte]].isEmpty` will be thrown `java.lang.NullPointerException`.
   
   <img width="491" alt="image" src="https://github.com/apache/spark/assets/15246973/4cf83e36-1c4a-4b65-b119-1152b026ed29">
   
   ### Why are the changes needed?
   Fix flaky ProtobufCatalystDataConversionSuite.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   ### How was this patch tested?
   Pass GA.
   Manually test:
   ```
   ./build/sbt "protobuf/testOnly org.apache.spark.sql.protobuf.ProtobufCatalystDataConversionSuite"
   ```
   <img width="763" alt="image" src="https://github.com/apache/spark/assets/15246973/95e76b2e-5b09-4268-81bf-df20dd262b66">
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778217161

   @HeartSaVioR yes, backporting will be good. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1777987275

   Please run the test suite with all combination for available seeds and available testing types and make sure all of them are passing. Let's fix all the broken-but-not-caught cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778240732

   Thanks @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1370711411


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   We should allow null value for the field. I.e. change this to:
      ```scala
        (data.get(0) == null || data.get(0).asInstanceOf[Array[Byte]].isEmpty)
     ```
   
   I will test on my side to ensure this test stays stable with range of `seed` values.
   cc: @HeartSaVioR 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778340888

   > Would you mind filing PRs for backporting this to 3.5 and 3.4? Separate PRs would be preferred. Please also include the change in this PR #43424 - only need to incorporate the change in connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
   > 
   > Thanks in advance!
   
   Let me do it right away.
   Thank for all reviewing and help testing @rangadi @HeartSaVioR @MaxGekk @HyukjinKwon @LuciferYang 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1369783526


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   If this is expected, is this the only point of risk? In other words, is this the only place where `data.get(0)` needs to be null-checked?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "MaxGekk (via GitHub)" <gi...@apache.org>.
MaxGekk commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1777014390

   @SandishKumarHN Could you take a look at the PR, please. Seems similar changes as in https://github.com/apache/spark/pull/38515 . Also cc @rangadi 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778201842

   @rangadi 
   I just indicated that the test suite with seed existed from 3.4. I guess you've also fixed some of the test issue in other PR. Should we port back that PR as well?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778228481

   Merged to master


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778245180

   This has to go to 3.5/3.4 as well. 
   
   @panbingkun Would you mind file a PR for backport to 3.5 and 3.4? Separate PRs would be preferred. 
   Please also include the change in this PR https://github.com/apache/spark/pull/43424 - only need to incorporate the change in connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
   
   Thanks in advance!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1370711411


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   We should allow null value for the field. I.e. change this to:
      ```scala
        data.get(0) == null || data.get(0).asInstanceOf[Array[Byte]].isEmpty
     ```
   
   I will test on my side to ensure this test stays stable with range of `seed` values.
   cc: @HeartSaVioR 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778403933

   > This has to go to 3.5/3.4 as well.
   > 
   > @panbingkun Would you mind filing PRs for backporting this to 3.5 and 3.4? Separate PRs would be preferred. Please also include the change in this PR #43424 - only need to incorporate the change in connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala
   > 
   > Thanks in advance!
   
   Done.
   Branch-3.4: https://github.com/apache/spark/pull/43520
   Branch-3.5: https://github.com/apache/spark/pull/43521


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1778188720

   @HeartSaVioR I have run this fix with see values from 0 to 10000. All were successful. Could you merge this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "panbingkun (via GitHub)" <gi...@apache.org>.
panbingkun commented on PR #43493:
URL: https://github.com/apache/spark/pull/43493#issuecomment-1776393827

   cc @LuciferYang @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1369776213


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   `data.get(0)` is null seems due to the following code producing an empty `Array[Byte]`, which means that `rand.nextInt(MAX_STR_LEN)` returned 0. I'm not sure if this is our original intention.
   
   https://github.com/apache/spark/blob/38222c4e8c581502e65a4224c3c4201136f7e390/sql/catalyst/src/test/scala/org/apache/spark/sql/RandomDataGenerator.scala#L187-L191



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "LuciferYang (via GitHub)" <gi...@apache.org>.
LuciferYang commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1370119358


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   Please disregard the comment above, it is incorrect.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43493: [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite
URL: https://github.com/apache/spark/pull/43493


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45640][SQL][TESTS] Fix flaky ProtobufCatalystDataConversionSuite [spark]

Posted by "rangadi (via GitHub)" <gi...@apache.org>.
rangadi commented on code in PR #43493:
URL: https://github.com/apache/spark/pull/43493#discussion_r1370911418


##########
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufCatalystDataConversionSuite.scala:
##########
@@ -138,6 +138,7 @@ class ProtobufCatalystDataConversionSuite
         data != null &&
         (data.get(0) == defaultValue ||
           (dt.fields(0).dataType == BinaryType &&
+            data.get(0) != null &&

Review Comment:
   Ignore this suggestion. `data.get(0) != null` is correct.  It allows testing null. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org