You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/06/09 23:28:00 UTC

[jira] [Updated] (SPARK-26425) Add more constraint checks in file streaming source to avoid checkpoint corruption

     [ https://issues.apache.org/jira/browse/SPARK-26425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-26425:
----------------------------------
    Affects Version/s:     (was: 2.4.0)
                       3.0.0

> Add more constraint checks in file streaming source to avoid checkpoint corruption
> ----------------------------------------------------------------------------------
>
>                 Key: SPARK-26425
>                 URL: https://issues.apache.org/jira/browse/SPARK-26425
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>
> Two issues observed in production. 
> - HDFSMetadataLog.getLatest() tries to read older versions when it is not able to read the latest listed version file. Not sure why this was done but this should not be done. If the latest listed file is not readable, then something is horribly wrong and we should fail rather than report an older version as that can completely corrupt the checkpoint directory. 
> - FileStreamSource should check whether adding the a new batch to the FileStreamSourceLog succeeded or not (similar to how StreamExecution checks for the OffsetSeqLog)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org