You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Something Something <ma...@gmail.com> on 2020/06/26 21:12:04 UTC

Spark Structured Streaming: “earliest” as “startingOffsets” is not working

My Spark Structured Streaming job works fine when I set "startingOffsets"
to "latest". When I simply change it to "earliest" & specify a new "check
point directory", the job doesn't work. The states don't get timed out
after 10 minutes.

While debugging I noticed that my 'state' logic is indeed getting executed
but states just don't time out - as they do when I use "latest". Any reason
why?

Is this a known issue?

*Note*: I've tried this under Spark 2.3 & 2.4

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

Posted by Srinivas V <sr...@gmail.com>.
Cool. Are you not using watermark ?
Also, is it possible to start listening offsets from a specific date time ?

Regards
Srini

On Sat, Jun 27, 2020 at 6:12 AM Eric Beabes <ma...@gmail.com>
wrote:

> My apologies...  After I set the 'maxOffsetsPerTrigger' to a value such as
> '200000' it started working. Hopefully this will help someone. Thanks.
>
> On Fri, Jun 26, 2020 at 2:12 PM Something Something <
> mailinglists19@gmail.com> wrote:
>
>> My Spark Structured Streaming job works fine when I set "startingOffsets"
>> to "latest". When I simply change it to "earliest" & specify a new "check
>> point directory", the job doesn't work. The states don't get timed out
>> after 10 minutes.
>>
>> While debugging I noticed that my 'state' logic is indeed getting
>> executed but states just don't time out - as they do when I use "latest".
>> Any reason why?
>>
>> Is this a known issue?
>>
>> *Note*: I've tried this under Spark 2.3 & 2.4
>>
>

Re: Spark Structured Streaming: “earliest” as “startingOffsets” is not working

Posted by Eric Beabes <ma...@gmail.com>.
My apologies...  After I set the 'maxOffsetsPerTrigger' to a value such as
'200000' it started working. Hopefully this will help someone. Thanks.

On Fri, Jun 26, 2020 at 2:12 PM Something Something <
mailinglists19@gmail.com> wrote:

> My Spark Structured Streaming job works fine when I set "startingOffsets"
> to "latest". When I simply change it to "earliest" & specify a new "check
> point directory", the job doesn't work. The states don't get timed out
> after 10 minutes.
>
> While debugging I noticed that my 'state' logic is indeed getting executed
> but states just don't time out - as they do when I use "latest". Any reason
> why?
>
> Is this a known issue?
>
> *Note*: I've tried this under Spark 2.3 & 2.4
>