You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jiří Syrový <sy...@gmail.com> on 2018/05/03 17:43:37 UTC

[Structured streaming, V2] commit on ContinuousReader

Version: 2.3, DataSourceV2, ContinuousReader

Hi,

We're creating a new data source to fetch data from streaming source that
requires commiting received data and we would like to commit data once in a
while after it has been retrieved and correctly processed and then fetch
more.

One option could be to rely on spark committing already read data
using *commit(end:
Offset)* that is present in *ContinuousReader (v2.reader.streaming)*, but
it seems that this method is never called.

The question is if this method *commit(end: Offset) is ever* used and when?
I went through part of Spark code base, but haven't really found any place
where it could be called.

Thanks,
Jiri

Re: [Structured streaming, V2] commit on ContinuousReader

Posted by Joseph Torres <jo...@databricks.com>.
In the master branch, we currently call this method in
ContinuousExecution.commit().

Note that the ContinuousReader API is experimental and undergoing active
design work. We will definitely include some kind of functionality to
back-commit data once it's been processed, but the handle we eventually
stabilize won't necessarily be `*commit(end: Offset)`.*

On Thu, May 3, 2018 at 10:43 AM, Jiří Syrový <sy...@gmail.com> wrote:

> Version: 2.3, DataSourceV2, ContinuousReader
>
> Hi,
>
> We're creating a new data source to fetch data from streaming source that
> requires commiting received data and we would like to commit data once in a
> while after it has been retrieved and correctly processed and then fetch
> more.
>
> One option could be to rely on spark committing already read data using *commit(end:
> Offset)* that is present in *ContinuousReader (v2.reader.streaming)*, but
> it seems that this method is never called.
>
> The question is if this method *commit(end: Offset) is ever* used and
> when? I went through part of Spark code base, but haven't really found any
> place where it could be called.
>
> Thanks,
> Jiri
>
>