You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2022/12/05 07:35:00 UTC

[jira] [Assigned] (SPARK-41387) Add assertion on end offset range for Kafka data source with Trigger.AvailableNow

     [ https://issues.apache.org/jira/browse/SPARK-41387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-41387:
------------------------------------

    Assignee: Apache Spark

> Add assertion on end offset range for Kafka data source with Trigger.AvailableNow
> ---------------------------------------------------------------------------------
>
>                 Key: SPARK-41387
>                 URL: https://issues.apache.org/jira/browse/SPARK-41387
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.4.0
>            Reporter: Jungtaek Lim
>            Assignee: Apache Spark
>            Priority: Minor
>
> Although there are lots of benefits Trigger.AvailableNow provides, we figure out one caveat of Trigger.AvailableNow, very sensitive on the offset range.
> Trigger.AvailableNow stops the query when the start offset and end offset are being same, producing no data from data source. Given the semantic of Trigger.AvailableNow, the implementation of data source is expected to retrieve the final offset at the start of the query, and gradually increase the offset range to eventually reach the final offset.
> Any bug breaking this leads to infinity run of the query, hence all data source implementations supporting Trigger.AvailableNow are encouraged to have some assertion to prevent such case in prior.
> For built-in data sources, only Kafka data source is something supporting Trigger.AvailableNow but don't have some assertion on the offset range. We'd like to add some assertion against Kafka data source, for Trigger.AvailableNow.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org