You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "juliuszsompolski (via GitHub)" <gi...@apache.org> on 2023/10/04 17:31:27 UTC

[PR] [SPARK-45416][CONNECT] Sanity check consistency of Arrow results [spark]

juliuszsompolski opened a new pull request, #43219:
URL: https://github.com/apache/spark/pull/43219

   ### What changes were proposed in this pull request?
   
   Sanity check that Arrow Batches in the results from Spark Connect contain the declared number of rows, and that they come from consecutive and contiguous offsets in query results.
   
   Getting this mixed up could lead to silent wrong results.
   
   ### Why are the changes needed?
   
   Protection from hard to crack issues.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Exercised by every query in existing coverage.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45416][CONNECT] Sanity check consistency of Arrow results [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #43219:
URL: https://github.com/apache/spark/pull/43219#issuecomment-1755860525

   Flake, rest passes.
   ```
   An error has been caught http-client index 0, retrying the upload
   Error: connect ETIMEDOUT 20.22.166.15:443
       at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1278:16) {
     errno: -110,
     code: 'ETIMEDOUT',
     syscall: 'connect',
     address: '20.22.166.15',
     port: 443
   }
   Retry limit has been reached for chunk at offset 0 to https://pipelinesghubeus21.actions.githubusercontent.com/oUsNNlae19Nq6pyu4JWOTqJgDgJA5pRHaFUNjNDxh9WNoyRymc/_apis/resources/Containers/40147494?itemPath=test-results-pyspark-pandas-connect-part0--8-hadoop3-hive2.3%2Ftarget%2Ftest-reports%2FTEST-pyspark.pandas.tests.connect.test_parity_series_conversion.SeriesConversionParityTests-20231010145612.xml
   Warning: Aborting upload for /__w/apache-spark/apache-spark/target/test-reports/TEST-pyspark.pandas.tests.connect.test_parity_series_conversion.SeriesConversionParityTests-20231010145612.xml due to failure
   Error: aborting artifact upload
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45416][CONNECT] Sanity check consistency of Arrow results [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon commented on PR #43219:
URL: https://github.com/apache/spark/pull/43219#issuecomment-1756702827

   Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45416][CONNECT] Sanity check consistency of Arrow results [spark]

Posted by "HyukjinKwon (via GitHub)" <gi...@apache.org>.
HyukjinKwon closed pull request #43219: [SPARK-45416][CONNECT] Sanity check consistency of Arrow results
URL: https://github.com/apache/spark/pull/43219


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45416][CONNECT] Sanity check consistency of Arrow results [spark]

Posted by "juliuszsompolski (via GitHub)" <gi...@apache.org>.
juliuszsompolski commented on PR #43219:
URL: https://github.com/apache/spark/pull/43219#issuecomment-1747355658

   @hvanhovell @grundprinzip @HyukjinKwon 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org