You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Flavio Pompermaier <po...@okkam.it> on 2020/11/24 08:55:28 UTC

JobListener weird behaviour

Hello everybody,
these days I have been trying to use the JobListener to implement a simple
logic in our platform that consists in calling an external service to
signal that the job has ended and, in case of failure, save the error cause.

After some problems to make it work when starting a job using the
RestClusterClient on a standalone session cluster I found that it behaves
in a strange way when the job fails:onJobSubmitted:
the onJobExecuted(JobExecutionResult, Throwable) is called with Throwable
equals to null and if I use the JobID contained in the JobExecutionResult
the fetch the job status / error (in a monitoring thread that I start
in onJobSubmitted() as I explain at the end of this email) I continue to
see that the job status is RUNNING for a while/
Shouldn't onJobExecuted() be called after the final job state transition
(as a very last callback)?

Another last weird thing: I submit the job using the jarRunHandler of the
REST API but the JobClient passed in the onJobSubmitted() is a
WebSubmissionJobClient that is a VERY basic implementation (actually it
provides only the job ID) and does not allow to get the job status...for
this reason (and the fact that the onJobExecuted is not called on the final
state transition) I had to create a separate monitor thread in
the onJobSubmitted (that create a RestClusterClient to get the status of
the job every 10 seconds and, in case of failure, the exceptions associated
to it)..but this is very uncomfortable and I don't really like it..is there
any effort to improve this?

Best,
Flavio

Re: JobListener weird behaviour

Posted by Till Rohrmann <tr...@apache.org>.
Hi Flavio,

looking only at the code, then the job should first transition into a
globally terminal state before notifying the client about it. The only
possible reason I could see for this behaviour is that the
RestServerEndpoint uses an ExecutionGraphCache (DefaultExecutionGraphCache
is the implementation) which caches `ArchivedExecutionGraphs` so that the
REST handlers don't flood the Dispatcher with `requestJob` requests. The
cache keeps the entries for 3 seconds before asking the cluster again. So
you might ask a REST handler which responds to you based on cached and
thereby outdated results. At the moment, the only easy way for working
around this problem is to decrease the `web.refresh-interval`.

For the JarRunHandler problem, I fear that this is a problem of the web
submission implementation which has accumulated a bit of technical debt. As
far as I know, nobody is actively working on it at the moment.

Cheers,
Till

On Tue, Nov 24, 2020 at 10:00 AM Flavio Pompermaier <po...@okkam.it>
wrote:

> Hello everybody,
> these days I have been trying to use the JobListener to implement a simple
> logic in our platform that consists in calling an external service to
> signal that the job has ended and, in case of failure, save the error cause.
>
> After some problems to make it work when starting a job using the
> RestClusterClient on a standalone session cluster I found that it behaves
> in a strange way when the job fails:onJobSubmitted:
> the onJobExecuted(JobExecutionResult, Throwable) is called with Throwable
> equals to null and if I use the JobID contained in the JobExecutionResult
> the fetch the job status / error (in a monitoring thread that I start
> in onJobSubmitted() as I explain at the end of this email) I continue to
> see that the job status is RUNNING for a while/
> Shouldn't onJobExecuted() be called after the final job state transition
> (as a very last callback)?
>
> Another last weird thing: I submit the job using the jarRunHandler of the
> REST API but the JobClient passed in the onJobSubmitted() is a
> WebSubmissionJobClient that is a VERY basic implementation (actually it
> provides only the job ID) and does not allow to get the job status...for
> this reason (and the fact that the onJobExecuted is not called on the final
> state transition) I had to create a separate monitor thread in
> the onJobSubmitted (that create a RestClusterClient to get the status of
> the job every 10 seconds and, in case of failure, the exceptions associated
> to it)..but this is very uncomfortable and I don't really like it..is there
> any effort to improve this?
>
> Best,
> Flavio
>