You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2018/12/21 11:29:00 UTC
[jira] [Created] (SPARK-26425) Add more constraint checks in file
streaming source to avoid checkpoint corruption
Tathagata Das created SPARK-26425:
-------------------------------------
Summary: Add more constraint checks in file streaming source to avoid checkpoint corruption
Key: SPARK-26425
URL: https://issues.apache.org/jira/browse/SPARK-26425
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 2.4.0
Reporter: Tathagata Das
Assignee: Tathagata Das
Two issues observed in production.
- HDFSMetadataLog.getLatest() tries to read older versions when it is not able to read the latest listed version file. Not sure why this was done but this should not be done. If the latest listed file is not readable, then something is horribly wrong and we should fail rather than report an older version as that can completely corrupt the checkpoint directory.
- FileStreamSource should check whether adding the a new batch to the FileStreamSourceLog succeeded or not (similar to how StreamExecution checks for the OffsetSeqLog)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org