You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2020/12/30 07:30:20 UTC

Flink SQL, temporal joins and backfilling data

Hi!

I have a Flink SQL job that does a few temporal joins and has been running
for over a month on regular data.  No issues.  Ran well.

I'm trying to re-run the Flink SQL job on the same data set but it's
failing to checkpoint and very slow to make progress.  I've modified some
of the checkpoint settings.

What else do I have to modify?

My data size is really small so I'm guessing it's still keeping state for
data outside the temporal join time windows.  Do I have to set Idle State
Retention Time to forget older data?

- Dan

Re: Flink SQL, temporal joins and backfilling data

Posted by Dan Hill <qu...@gmail.com>.
Hi Timo.  Sorry for the delay.  I'll message this message the next time I
hit this.  I haven't restarted my job in 12 days.  I'll check the
watermarks the next time I restart.

On Tue, Jan 5, 2021 at 4:47 AM Timo Walther <tw...@apache.org> wrote:

> Hi Dan,
>
> are you sure that your watermarks are still correct during reprocessing?
> As far as I know, idle state retention is not used for temporal joins.
> The watermark indicates when state can be removed in this case.
>
> Maybe you can give us some more details about which kind of temporal
> join you are using (event-time or processing-time?) and checkpoint
> settings?
>
> Regards,
> Timo
>
> On 30.12.20 08:30, Dan Hill wrote:
> > Hi!
> >
> > I have a Flink SQL job that does a few temporal joins and has been
> > running for over a month on regular data.  No issues.  Ran well.
> >
> > I'm trying to re-run the Flink SQL job on the same data set but it's
> > failing to checkpoint and very slow to make progress.  I've modified
> > some of the checkpoint settings.
> >
> > What else do I have to modify?
> >
> > My data size is really small so I'm guessing it's still keeping state
> > for data outside the temporal join time windows.  Do I have to set Idle
> > State Retention Time to forget older data?
> >
> > - Dan
>
>

Re: Flink SQL, temporal joins and backfilling data

Posted by Timo Walther <tw...@apache.org>.
Hi Dan,

are you sure that your watermarks are still correct during reprocessing? 
As far as I know, idle state retention is not used for temporal joins. 
The watermark indicates when state can be removed in this case.

Maybe you can give us some more details about which kind of temporal 
join you are using (event-time or processing-time?) and checkpoint settings?

Regards,
Timo

On 30.12.20 08:30, Dan Hill wrote:
> Hi!
> 
> I have a Flink SQL job that does a few temporal joins and has been 
> running for over a month on regular data.  No issues.  Ran well.
> 
> I'm trying to re-run the Flink SQL job on the same data set but it's 
> failing to checkpoint and very slow to make progress.  I've modified 
> some of the checkpoint settings.
> 
> What else do I have to modify?
> 
> My data size is really small so I'm guessing it's still keeping state 
> for data outside the temporal join time windows.  Do I have to set Idle 
> State Retention Time to forget older data?
> 
> - Dan