You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/10/17 03:50:06 UTC

[GitHub] [spark] sadikovi commented on pull request #38277: [SPARK-40815][SQL] Disable "spark.hadoopRDD.ignoreEmptySplits" in order to fix the correctness issue when using Hive SymlinkTextInputFormat

sadikovi commented on PR #38277:
URL: https://github.com/apache/spark/pull/38277#issuecomment-1280241811

   @dongjoon-hyun Would you be able to review this PR?
   
   I have read the comments on the original PR and it seems the strategy was to just document the behaviour change. I would like to point out that SymlinkTextInputFormat is one of the cases where this change is not safe as it silently causes incorrect results instead of throwing an error.
   
    I also considered an alternative fix of substituting `SymlinkTextInputFormat` with a shim input format in `HiveTableScanExec` that correctly sets those fields. This would be transparent to the users, they could still specify the original one. 
   
   Maybe it would be better to implement this instead of disabling the flag, so let me know.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org