You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "duanmeng (Jira)" <ji...@apache.org> on 2020/09/29 09:16:00 UTC

[jira] [Created] (SPARK-33022) partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter

duanmeng created SPARK-33022:
--------------------------------

             Summary: partition length is wrong after merge partition segments in BypassMergeSortShuffleWriter
                 Key: SPARK-33022
                 URL: https://issues.apache.org/jira/browse/SPARK-33022
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 2.4.6
            Reporter: duanmeng


A data file might be empty even after  DiskBlockObjectWriter committing it in BypassMergeSortShuffleWriter, returned wrong lengths in writePartitionedFile, and then cause data lost. This is related to disk/kernel but we can avoid it in spark without any performance loss. We can compare partitionWriterSegments[i].length with the length[i] after Utils.copyStream.

I added some logs and caught the failure,

The log when this issue happened
{code:java}
20/09/28 00:42:44 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_38244ef5-8e97-4428-97b8-feffc16fc9f7, offset=0, length=1462) 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: File length: 0 20/09/28 00:42:46 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 0{code}
 

The peer log when this issue didn't happen
{code:java}
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: partitionWriterSegments[0]: (name=temp_shuffle_f6937469-39fd-4576-b40e-69f4276cc8e4, offset=0, length=1462)
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: File length: 1462
20/09/28 10:11:45 INFO sort.BypassMergeSortShuffleWriter: Copied stream length: 1462
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org