You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2018/12/21 11:29:00 UTC

[jira] [Created] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

Tathagata Das created SPARK-26425:
-------------------------------------

             Summary: Add more constraint checks in file streaming source to avoid checkpoint corruption
                 Key: SPARK-26425
                 URL: https://issues.apache.org/jira/browse/SPARK-26425
             Project: Spark
          Issue Type: Improvement
          Components: Structured Streaming
    Affects Versions: 2.4.0
            Reporter: Tathagata Das
            Assignee: Tathagata Das


Two issues observed in production. 
- HDFSMetadataLog.getLatest() tries to read older versions when it is not able to read the latest listed version file. Not sure why this was done but this should not be done. If the latest listed file is not readable, then something is horribly wrong and we should fail rather than report an older version as that can completely corrupt the checkpoint directory. 
- FileStreamSource should check whether adding the a new batch to the FileStreamSourceLog succeeded or not (similar to how StreamExecution checks for the OffsetSeqLog)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org