You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/12/04 03:37:00 UTC

[jira] [Commented] (SPARK-26261) Spark does not check completeness temporary file

    [ https://issues.apache.org/jira/browse/SPARK-26261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16708153#comment-16708153 ] 

Hyukjin Kwon commented on SPARK-26261:
--------------------------------------

Mind if I ask the initial test you ran?

> Spark does not check completeness temporary file 
> -------------------------------------------------
>
>                 Key: SPARK-26261
>                 URL: https://issues.apache.org/jira/browse/SPARK-26261
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.3.2
>            Reporter: Jialin LIu
>            Priority: Minor
>
> Spark does not check temporary files' completeness. When persisting to disk is enabled on some RDDs, a bunch of temporary files will be created on blockmgr folder. Block manager is able to detect missing blocks while it is not able detect file content being modified during execution. 
> Our initial test shows that if we truncate the block file before being used by executors, the program will finish without detecting any error, but the result content is totally wrong.
> We believe there should be a file checksum on every RDD file block and these files should be protected by checksum.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org