You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Shixiong Zhu (Jira)" <ji...@apache.org> on 2019/11/18 23:28:00 UTC
[jira] [Updated] (SPARK-29953) File stream source cleanup options
may break a file sink output
[ https://issues.apache.org/jira/browse/SPARK-29953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shixiong Zhu updated SPARK-29953:
---------------------------------
Description:
SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more because we delete files from the directory but they are still tracked by file sink logs.
I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder).
was:
SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more.
I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder).
> File stream source cleanup options may break a file sink output
> ---------------------------------------------------------------
>
> Key: SPARK-29953
> URL: https://issues.apache.org/jira/browse/SPARK-29953
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Shixiong Zhu
> Priority: Major
>
> SPARK-20568 added options to file streaming source to clean up processed files. However, when applying these options to a directory that was written by a file streaming sink, it will make the directory not queryable any more because we delete files from the directory but they are still tracked by file sink logs.
> I think we should block the options if the input source is a file streaming sink path (has "_spark_metadata" folder).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org