You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/11/01 12:40:52 UTC

[GitHub] [spark] gaoyajun02 commented on pull request #38333: [SPARK-40872] Fallback to original shuffle block when a push-merged shuffle chunk is zero-size

gaoyajun02 commented on PR #38333:
URL: https://github.com/apache/spark/pull/38333#issuecomment-1298452168

   We have now located the cause of zero-size chunk loss, 
   We have located the cause of the zero-size chunk problem on the shuffle service node. and there is the following information in the system `dmesg -T`:
   ```
   e Nov  1 19:40:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 25755946 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   [Tue Nov  1 20:01:04 2022] EXT4-fs (sde1): Delayed block allocation failed for inode 23266116 at logical offset 0 with max blocks 15 with error 117
   ```
   Although this is not from the software layer, and the number of bad nodes that lose data is very low, I think it makes sense to support fallback here.
   
   cc  @otterc


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org