You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/03/28 11:02:51 UTC

[GitHub] [spark] TonyDoen opened a new pull request #35990: [SPARK-38639] Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly

TonyDoen opened a new pull request #35990:
URL: https://github.com/apache/spark/pull/35990


   
   ### What changes were proposed in this pull request?
   This PR adds a "spark.sql.hive.ignoreCorruptRecord" to fill out the functionality that users can query successfully in dirty data(mixed schema in one table).
   
   
   ### Why are the changes needed?
   There's an existing flag "spark.sql.files.ignoreCorruptFiles" and "spark.sql.files.ignoreMissingFiles" that will quietly ignore attempted reads from files that have been corrupted, but it still allows the query to fail on sequence files.
   
   Being able to ignore corrupt record is useful in the scenarios that users want to query successfully in dirty data(mixed schema in one table).
   
   We would like to add a "spark.sql.hive.ignoreCorruptRecord" to fill out the functionality.
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, add new config: "spark.sql.hive.ignoreCorruptRecord"
   
   
   ### How was this patch tested?
   Manually tested in local and existed UT
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #35990: [SPARK-38639][SQL] Support ignoreCorruptRecord flag to ensure querying broken sequence file table smoothly

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #35990:
URL: https://github.com/apache/spark/pull/35990#issuecomment-1081977910


   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org