You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by David Morin <mo...@gmail.com> on 2019/09/22 20:57:08 UTC

How to prevent from launching 2 jobs at the same time

Hi,

What is the best way to prevent from launching 2 jobs with the same name concurrently ?
Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).

David

Re: How to prevent from launching 2 jobs at the same time

Posted by Theo Diefenthal <th...@scoop-software.de>.
My simple workaround for it: I start the applications always from the same machine via CLI and just make a file-system-lock around execution of the check-if-task-is-already-running and task-launching part. This of course is a possible single-point-of-failure to rely on one machine starting the jobs but works in my current environment. 

Best regards
Theo

----- Ursprüngliche Mail -----
Von: "David Morin" <mo...@gmail.com>
An: "user" <us...@flink.apache.org>
Gesendet: Montag, 23. September 2019 17:21:17
Betreff: Re: How to prevent from launching 2 jobs at the same time

Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <tr...@apache.org> wrote: 
> Hi David,
> 
> you could use Flink's RestClusterClient and call #listJobs to obtain the
> list of jobs being executed on the cluster (note that it will also report
> finished jobs). By providing a properly configured Configuration (e.g.
> loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
> automatically detect where the JobManager is running (e.g. via ZooKeeper if
> HA is enabled or it picks up the configured JobManager address from the
> configuration).
> 
> Of course, you could also provide the JobManager address as a parameter.
> 
> Cheers,
> Till
> 
> On Mon, Sep 23, 2019 at 9:08 AM David Morin <mo...@gmail.com>
> wrote:
> 
> > Hi,
> >
> > Thanks for your replies.
> > Yes, it could be useful to have a way to define jobid. Thus, I would have
> > been able to define the jbid based on the name for example. At the moment
> > we do not use the REST API but the cli to submit our jobs on Yarn.
> > Nevertheless, I can implement a little trick: at startup query the Rest
> > API and throw an Exception if a job with the same same is running.
> > Question: is there a way to retrieve the Job manager uri from my code or
> > should I provide it as parameter ?
> > thx.
> > David
> >
> > On 2019/09/23 03:09:42, Zili Chen <wa...@gmail.com> wrote:
> > > The situation is as Dian said. Flink identifies jobs by job id instead of
> > > job name.
> > >
> > > However, I think it is still a valid question if it is an alternative
> > Flink
> > > identifies jobs by job name and
> > > leaves the work to distinguish jobs by name to users. The advantages in
> > > this way includes a readable
> > > display and interaction, as well as reduce some hardcode works on job id,
> > > such as we always set
> > > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > > ZK path.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:
> > >
> > > > Hi David,
> > > >
> > > > The jobs are identified by job id, not by job name internally in Flink
> > and
> > > > so It will only check if there are two jobs with the same job id.
> > > >
> > > > If you submit the job via CLI[1], I'm afraid there are still no
> > built-in
> > > > ways provided as currently the job id is generated randomly when
> > submitting
> > > > a job via CLI and the generated job id has nothing to do with the job
> > name.
> > > > However, if you submit the job via REST API [2], it did provide an
> > option
> > > > to specify the job id when submitting a job. You can generate the job
> > id by
> > > > yourself.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > > [2]
> > > >
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > > >
> > > > 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
> > > >
> > > > Hi,
> > > >
> > > > What is the best way to prevent from launching 2 jobs with the same
> > name
> > > > concurrently ?
> > > > Instead of doing a check in the script that starts the Flink job, I
> > would
> > > > prefer to stop a job if another one with the same name is in progress
> > > > (Exception or something like that).
> > > >
> > > > David
> > > >
> > > >
> > > >
> > >
> >
>


Re: How to prevent from launching 2 jobs at the same time

Posted by David Morin <mo...@gmail.com>.
Thanks Till,

Perfect. I gonna use RestClusterClient with listJobs
It should work perfectly for my need

Cheers
David

On 2019/09/23 12:36:46, Till Rohrmann <tr...@apache.org> wrote: 
> Hi David,
> 
> you could use Flink's RestClusterClient and call #listJobs to obtain the
> list of jobs being executed on the cluster (note that it will also report
> finished jobs). By providing a properly configured Configuration (e.g.
> loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
> automatically detect where the JobManager is running (e.g. via ZooKeeper if
> HA is enabled or it picks up the configured JobManager address from the
> configuration).
> 
> Of course, you could also provide the JobManager address as a parameter.
> 
> Cheers,
> Till
> 
> On Mon, Sep 23, 2019 at 9:08 AM David Morin <mo...@gmail.com>
> wrote:
> 
> > Hi,
> >
> > Thanks for your replies.
> > Yes, it could be useful to have a way to define jobid. Thus, I would have
> > been able to define the jbid based on the name for example. At the moment
> > we do not use the REST API but the cli to submit our jobs on Yarn.
> > Nevertheless, I can implement a little trick: at startup query the Rest
> > API and throw an Exception if a job with the same same is running.
> > Question: is there a way to retrieve the Job manager uri from my code or
> > should I provide it as parameter ?
> > thx.
> > David
> >
> > On 2019/09/23 03:09:42, Zili Chen <wa...@gmail.com> wrote:
> > > The situation is as Dian said. Flink identifies jobs by job id instead of
> > > job name.
> > >
> > > However, I think it is still a valid question if it is an alternative
> > Flink
> > > identifies jobs by job name and
> > > leaves the work to distinguish jobs by name to users. The advantages in
> > > this way includes a readable
> > > display and interaction, as well as reduce some hardcode works on job id,
> > > such as we always set
> > > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > > ZK path.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:
> > >
> > > > Hi David,
> > > >
> > > > The jobs are identified by job id, not by job name internally in Flink
> > and
> > > > so It will only check if there are two jobs with the same job id.
> > > >
> > > > If you submit the job via CLI[1], I'm afraid there are still no
> > built-in
> > > > ways provided as currently the job id is generated randomly when
> > submitting
> > > > a job via CLI and the generated job id has nothing to do with the job
> > name.
> > > > However, if you submit the job via REST API [2], it did provide an
> > option
> > > > to specify the job id when submitting a job. You can generate the job
> > id by
> > > > yourself.
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > [1]
> > https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > > [2]
> > > >
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > > >
> > > > 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
> > > >
> > > > Hi,
> > > >
> > > > What is the best way to prevent from launching 2 jobs with the same
> > name
> > > > concurrently ?
> > > > Instead of doing a check in the script that starts the Flink job, I
> > would
> > > > prefer to stop a job if another one with the same name is in progress
> > > > (Exception or something like that).
> > > >
> > > > David
> > > >
> > > >
> > > >
> > >
> >
> 

Re: How to prevent from launching 2 jobs at the same time

Posted by Till Rohrmann <tr...@apache.org>.
Hi David,

you could use Flink's RestClusterClient and call #listJobs to obtain the
list of jobs being executed on the cluster (note that it will also report
finished jobs). By providing a properly configured Configuration (e.g.
loading flink-conf.yaml via GlobalConfiguration#loadConfiguration) it will
automatically detect where the JobManager is running (e.g. via ZooKeeper if
HA is enabled or it picks up the configured JobManager address from the
configuration).

Of course, you could also provide the JobManager address as a parameter.

Cheers,
Till

On Mon, Sep 23, 2019 at 9:08 AM David Morin <mo...@gmail.com>
wrote:

> Hi,
>
> Thanks for your replies.
> Yes, it could be useful to have a way to define jobid. Thus, I would have
> been able to define the jbid based on the name for example. At the moment
> we do not use the REST API but the cli to submit our jobs on Yarn.
> Nevertheless, I can implement a little trick: at startup query the Rest
> API and throw an Exception if a job with the same same is running.
> Question: is there a way to retrieve the Job manager uri from my code or
> should I provide it as parameter ?
> thx.
> David
>
> On 2019/09/23 03:09:42, Zili Chen <wa...@gmail.com> wrote:
> > The situation is as Dian said. Flink identifies jobs by job id instead of
> > job name.
> >
> > However, I think it is still a valid question if it is an alternative
> Flink
> > identifies jobs by job name and
> > leaves the work to distinguish jobs by name to users. The advantages in
> > this way includes a readable
> > display and interaction, as well as reduce some hardcode works on job id,
> > such as we always set
> > job id to new JobID(0, 0) in standalone per-job mode for getting the same
> > ZK path.
> >
> > Best,
> > tison.
> >
> >
> > Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:
> >
> > > Hi David,
> > >
> > > The jobs are identified by job id, not by job name internally in Flink
> and
> > > so It will only check if there are two jobs with the same job id.
> > >
> > > If you submit the job via CLI[1], I'm afraid there are still no
> built-in
> > > ways provided as currently the job id is generated randomly when
> submitting
> > > a job via CLI and the generated job id has nothing to do with the job
> name.
> > > However, if you submit the job via REST API [2], it did provide an
> option
> > > to specify the job id when submitting a job. You can generate the job
> id by
> > > yourself.
> > >
> > > Regards,
> > > Dian
> > >
> > > [1]
> https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > > [2]
> > >
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> > >
> > > 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
> > >
> > > Hi,
> > >
> > > What is the best way to prevent from launching 2 jobs with the same
> name
> > > concurrently ?
> > > Instead of doing a check in the script that starts the Flink job, I
> would
> > > prefer to stop a job if another one with the same name is in progress
> > > (Exception or something like that).
> > >
> > > David
> > >
> > >
> > >
> >
>

Re: How to prevent from launching 2 jobs at the same time

Posted by David Morin <mo...@gmail.com>.
Hi,

Thanks for your replies.
Yes, it could be useful to have a way to define jobid. Thus, I would have been able to define the jbid based on the name for example. At the moment we do not use the REST API but the cli to submit our jobs on Yarn.
Nevertheless, I can implement a little trick: at startup query the Rest API and throw an Exception if a job with the same same is running.
Question: is there a way to retrieve the Job manager uri from my code or should I provide it as parameter ?
thx.
David

On 2019/09/23 03:09:42, Zili Chen <wa...@gmail.com> wrote: 
> The situation is as Dian said. Flink identifies jobs by job id instead of
> job name.
> 
> However, I think it is still a valid question if it is an alternative Flink
> identifies jobs by job name and
> leaves the work to distinguish jobs by name to users. The advantages in
> this way includes a readable
> display and interaction, as well as reduce some hardcode works on job id,
> such as we always set
> job id to new JobID(0, 0) in standalone per-job mode for getting the same
> ZK path.
> 
> Best,
> tison.
> 
> 
> Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:
> 
> > Hi David,
> >
> > The jobs are identified by job id, not by job name internally in Flink and
> > so It will only check if there are two jobs with the same job id.
> >
> > If you submit the job via CLI[1], I'm afraid there are still no built-in
> > ways provided as currently the job id is generated randomly when submitting
> > a job via CLI and the generated job id has nothing to do with the job name.
> > However, if you submit the job via REST API [2], it did provide an option
> > to specify the job id when submitting a job. You can generate the job id by
> > yourself.
> >
> > Regards,
> > Dian
> >
> > [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> > [2]
> > https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
> >
> > 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
> >
> > Hi,
> >
> > What is the best way to prevent from launching 2 jobs with the same name
> > concurrently ?
> > Instead of doing a check in the script that starts the Flink job, I would
> > prefer to stop a job if another one with the same name is in progress
> > (Exception or something like that).
> >
> > David
> >
> >
> >
> 

Re: How to prevent from launching 2 jobs at the same time

Posted by Zili Chen <wa...@gmail.com>.
The situation is as Dian said. Flink identifies jobs by job id instead of
job name.

However, I think it is still a valid question if it is an alternative Flink
identifies jobs by job name and
leaves the work to distinguish jobs by name to users. The advantages in
this way includes a readable
display and interaction, as well as reduce some hardcode works on job id,
such as we always set
job id to new JobID(0, 0) in standalone per-job mode for getting the same
ZK path.

Best,
tison.


Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:

> Hi David,
>
> The jobs are identified by job id, not by job name internally in Flink and
> so It will only check if there are two jobs with the same job id.
>
> If you submit the job via CLI[1], I'm afraid there are still no built-in
> ways provided as currently the job id is generated randomly when submitting
> a job via CLI and the generated job id has nothing to do with the job name.
> However, if you submit the job via REST API [2], it did provide an option
> to specify the job id when submitting a job. You can generate the job id by
> yourself.
>
> Regards,
> Dian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
>
> 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
>
> Hi,
>
> What is the best way to prevent from launching 2 jobs with the same name
> concurrently ?
> Instead of doing a check in the script that starts the Flink job, I would
> prefer to stop a job if another one with the same name is in progress
> (Exception or something like that).
>
> David
>
>
>

Re: How to prevent from launching 2 jobs at the same time

Posted by Zili Chen <wa...@gmail.com>.
The situation is as Dian said. Flink identifies jobs by job id instead of
job name.

However, I think it is still a valid question if it is an alternative Flink
identifies jobs by job name and
leaves the work to distinguish jobs by name to users. The advantages in
this way includes a readable
display and interaction, as well as reduce some hardcode works on job id,
such as we always set
job id to new JobID(0, 0) in standalone per-job mode for getting the same
ZK path.

Best,
tison.


Dian Fu <di...@gmail.com> 于2019年9月23日周一 上午10:55写道:

> Hi David,
>
> The jobs are identified by job id, not by job name internally in Flink and
> so It will only check if there are two jobs with the same job id.
>
> If you submit the job via CLI[1], I'm afraid there are still no built-in
> ways provided as currently the job id is generated randomly when submitting
> a job via CLI and the generated job id has nothing to do with the job name.
> However, if you submit the job via REST API [2], it did provide an option
> to specify the job id when submitting a job. You can generate the job id by
> yourself.
>
> Regards,
> Dian
>
> [1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html
> [2]
> https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run
>
> 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
>
> Hi,
>
> What is the best way to prevent from launching 2 jobs with the same name
> concurrently ?
> Instead of doing a check in the script that starts the Flink job, I would
> prefer to stop a job if another one with the same name is in progress
> (Exception or something like that).
>
> David
>
>
>

Re: How to prevent from launching 2 jobs at the same time

Posted by Dian Fu <di...@gmail.com>.
Hi David,

The jobs are identified by job id, not by job name internally in Flink and so It will only check if there are two jobs with the same job id. 

If you submit the job via CLI[1], I'm afraid there are still no built-in ways provided as currently the job id is generated randomly when submitting a job via CLI and the generated job id has nothing to do with the job name. 
However, if you submit the job via REST API [2], it did provide an option to specify the job id when submitting a job. You can generate the job id by yourself.

Regards,
Dian

[1] https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html <https://ci.apache.org/projects/flink/flink-docs-master/ops/cli.html>
[2] https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run <https://ci.apache.org/projects/flink/flink-docs-master/monitoring/rest_api.html#jars-jarid-run>
> 在 2019年9月23日,上午4:57,David Morin <mo...@gmail.com> 写道:
> 
> Hi,
> 
> What is the best way to prevent from launching 2 jobs with the same name concurrently ?
> Instead of doing a check in the script that starts the Flink job, I would prefer to stop a job if another one with the same name is in progress (Exception or something like that).
> 
> David