You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/09/17 00:03:00 UTC

[jira] [Assigned] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

     [ https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim reassigned SPARK-26425:
------------------------------------

    Assignee: Jungtaek Lim  (was: Tathagata Das)

> Add more constraint checks in file streaming source to avoid checkpoint corruption
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-26425
>                 URL: https://issues.apache.org/jira/browse/SPARK-26425
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.1.0
>            Reporter: Tathagata Das
>            Assignee: Jungtaek Lim
>            Priority: Major
>             Fix For: 3.1.0
>
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not able to read the latest listed version file. Not sure why this was done but this should not be done. If the latest listed file is not readable, then something is horribly wrong and we should fail rather than report an older version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the FileStreamSourceLog succeeded or not (similar to how StreamExecution checks for the OffsetSeqLog)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org