You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Gyula Fóra <gy...@apache.org> on 2016/01/02 11:57:32 UTC

Checkpointing to S3

Hey,

I am trying to checkpoint my streaming job to S3 but it seems that the
checkpoints never complete but also I don't get any error in the logs.

The state backend connects properly to S3 apparently as it creates the
following file in the given S3 directory :

95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a folder)

The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
and might cause the problem.
It seems that the backend doesnt properly create a folder for the job
checkpoints for the job id.

Does anyone have any idea what might cause this problem?

Thanks,
Gyula

Re: Checkpointing to S3

Posted by Gyula Fóra <gy...@gmail.com>.
Yes, this gives much more information :)

Cheers,
Gyula

Stephan Ewen <se...@apache.org> ezt írta (időpont: 2016. jan. 4., H, 16:24):

> Hey!
>
> Nice to hear that it works.
>
> A bit of info is now visible in the web dashboard now, as of that PR:
> https://github.com/apache/flink/pull/1453
>
> Is that what you had in mind?
>
> Greetings,
> Stephan
>
>
> On Sat, Jan 2, 2016 at 4:53 PM, Gyula Fóra <gy...@apache.org> wrote:
>
> > Ok, I could figure out the problem, it was my fault :). The issue was
> that
> > I was running a short testing job and the sources finished before
> > triggering the checkpoint. So the folder was created for the job in S3
> but
> > since we didn't write anything to it is shown as a file in S3.
> >
> > Maybe it would be good to give some info to the user in case the source
> is
> > finished when the checkpoint is triggered.
> >
> > On the bright side, it seems to work well, also with the savepoints :)
> >
> > Cheers
> > Gyula
> >
> > Gyula Fóra <gy...@apache.org> ezt írta (időpont: 2016. jan. 2., Szo,
> > 11:57):
> >
> > > Hey,
> > >
> > > I am trying to checkpoint my streaming job to S3 but it seems that the
> > > checkpoints never complete but also I don't get any error in the logs.
> > >
> > > The state backend connects properly to S3 apparently as it creates the
> > > following file in the given S3 directory :
> > >
> > > 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a
> > folder)
> > >
> > > The job id is 95560b1acf5307bc3096020071c83230, but that filename is
> odd
> > > and might cause the problem.
> > > It seems that the backend doesnt properly create a folder for the job
> > > checkpoints for the job id.
> > >
> > > Does anyone have any idea what might cause this problem?
> > >
> > > Thanks,
> > > Gyula
> > >
> > >
> > >
> > >
> > >
> >
>

Re: Checkpointing to S3

Posted by Stephan Ewen <se...@apache.org>.
Hey!

Nice to hear that it works.

A bit of info is now visible in the web dashboard now, as of that PR:
https://github.com/apache/flink/pull/1453

Is that what you had in mind?

Greetings,
Stephan


On Sat, Jan 2, 2016 at 4:53 PM, Gyula Fóra <gy...@apache.org> wrote:

> Ok, I could figure out the problem, it was my fault :). The issue was that
> I was running a short testing job and the sources finished before
> triggering the checkpoint. So the folder was created for the job in S3 but
> since we didn't write anything to it is shown as a file in S3.
>
> Maybe it would be good to give some info to the user in case the source is
> finished when the checkpoint is triggered.
>
> On the bright side, it seems to work well, also with the savepoints :)
>
> Cheers
> Gyula
>
> Gyula Fóra <gy...@apache.org> ezt írta (időpont: 2016. jan. 2., Szo,
> 11:57):
>
> > Hey,
> >
> > I am trying to checkpoint my streaming job to S3 but it seems that the
> > checkpoints never complete but also I don't get any error in the logs.
> >
> > The state backend connects properly to S3 apparently as it creates the
> > following file in the given S3 directory :
> >
> > 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a
> folder)
> >
> > The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
> > and might cause the problem.
> > It seems that the backend doesnt properly create a folder for the job
> > checkpoints for the job id.
> >
> > Does anyone have any idea what might cause this problem?
> >
> > Thanks,
> > Gyula
> >
> >
> >
> >
> >
>

Re: Checkpointing to S3

Posted by Gyula Fóra <gy...@apache.org>.
Ok, I could figure out the problem, it was my fault :). The issue was that
I was running a short testing job and the sources finished before
triggering the checkpoint. So the folder was created for the job in S3 but
since we didn't write anything to it is shown as a file in S3.

Maybe it would be good to give some info to the user in case the source is
finished when the checkpoint is triggered.

On the bright side, it seems to work well, also with the savepoints :)

Cheers
Gyula

Gyula Fóra <gy...@apache.org> ezt írta (időpont: 2016. jan. 2., Szo,
11:57):

> Hey,
>
> I am trying to checkpoint my streaming job to S3 but it seems that the
> checkpoints never complete but also I don't get any error in the logs.
>
> The state backend connects properly to S3 apparently as it creates the
> following file in the given S3 directory :
>
> 95560b1acf5307bc3096020071c83230_$folder$    (this is a file, not a folder)
>
> The job id is 95560b1acf5307bc3096020071c83230, but that filename is odd
> and might cause the problem.
> It seems that the backend doesnt properly create a folder for the job
> checkpoints for the job id.
>
> Does anyone have any idea what might cause this problem?
>
> Thanks,
> Gyula
>
>
>
>
>