You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2020/12/01 04:12:00 UTC

[jira] [Assigned] (SPARK-30900) FileStreamSource: Avoid reading compact metadata log twice if the query stops from compact batch and restarts

     [ https://issues.apache.org/jira/browse/SPARK-30900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim reassigned SPARK-30900:
------------------------------------

    Assignee: Jungtaek Lim

> FileStreamSource: Avoid reading compact metadata log twice if the query stops from compact batch and restarts
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-30900
>                 URL: https://issues.apache.org/jira/browse/SPARK-30900
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.1.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Minor
>
> When restarting the query, there is a case which the query starts from compaction batch, and the batch has source metadata file to read. One case is that the previous query succeeded to read from inputs, but not finalized the batch for various reasons.
> This case FileStreamSource will read the compact metadata file twice, one for retrieving all files to build seen file map, another one for retrieving entries in the batch. If the query processes huge number of inputs so far, compact metadata file becomes considerably bigger, so reading once more adds unnecessary latency on processing startup batch.
> This issue tracks the effort to address this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org