You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Pankaj Chand <pa...@gmail.com> on 2019/06/18 07:42:02 UTC

Role of Job Manager

I am trying to understand the role of Job Manager in Flink, and have come
across two possibly distinct interpretations.

1. The online documentation v1.8 signifies that there is at least one Job
Manager in a cluster, and it is closely tied to the cluster of machines, by
managing all jobs in that cluster of machines.

This signifies that Flink's Job Manager is much like Hadoop's Application
Manager.

2. The book, "Stream Processing with Apache Flink", writes that, "The Job
Manager is the master process that controls the execution of a single
application—each application is controlled by a different Job Manager."

This signifies that Flink defaults to one Job Manager per job, and the Job
Manager is closely tied to that single job, much like Hadoop's Application
Master for each job.

Please let me know which one is correct.

Pankaj

Re: Role of Job Manager

Posted by Pankaj Chand <pa...@gmail.com>.

Hi Biao,

Thank you for your reply!

Please let me know the url of the updated Flink documentation.

The url of the outdated document is:
https://ci.apache.org/projects/flink/flink-docs-release-1.8/concepts/runtime.html


Another page which (tacitly) supports the outdated concept is:
https://ci.apache.org/projects/flink/flink-docs-stable/internals/job_scheduling.html


The website that hosts these pages is also the first result that comes up
when you Google Search for "Flink documentation", and it claims it is a
stable version. The url is:
https://ci.apache.org/projects/flink/flink-docs-stable/

Again, please let me know the url of the updated Flink documentation.

Thank you Biao and Eduardo!

Pankaj

On Tue, Jun 18, 2019 at 11:49 PM Biao Liu <mm...@gmail.com> wrote:

> Hi Pankaj,
>
> That's really a good question. There was a refactor of architecture
> before[1]. So there might be some descriptions used the outdated concept.
>
> Before refactoring, Job Manager is a centralized role. It controls whole
> cluster and all jobs which is described in your interpretation 1.
>
> After refactoring, the old Job Manager is separated into several roles,
> Resource Manager, Dispatcher, new Job Manager, etc. The new Job Manager is
> responsible for only one job, which is described in your interpretation 2.
>
> So the document you refer to is outdated. Would you mind telling us the
> URL of this document? I think we should update it to avoid misleading more
> people.
>
> 1.
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077
>
> Eduardo Winpenny Tejedor <ed...@gmail.com> 于2019年6月19日周三
> 上午1:12写道：
>
>> Hi Pankaj,
>>
>> I have no experience with Hadoop but from the book I gathered there's one
>> Job Manager per application i.e. per jar (as in the example in the first
>> chapter). This is not to say there's one Job Manager per job. Actually I
>> don't think the word Job is defined in the book, I've seen Task defined,
>> and those do have Task Managers
>>
>> Hope this is along the right lines
>>
>> Regards,
>> Eduardo
>>
>> On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <pa...@gmail.com>
>> wrote:
>>
>>> I am trying to understand the role of Job Manager in Flink, and have
>>> come across two possibly distinct interpretations.
>>>
>>> 1. The online documentation v1.8 signifies that there is at least one
>>> Job Manager in a cluster, and it is closely tied to the cluster of
>>> machines, by managing all jobs in that cluster of machines.
>>>
>>> This signifies that Flink's Job Manager is much like Hadoop's
>>> Application Manager.
>>>
>>> 2. The book, "Stream Processing with Apache Flink", writes that, "The
>>> Job Manager is the master process that controls the execution of a single
>>> application—each application is controlled by a different Job Manager."
>>>
>>> This signifies that Flink defaults to one Job Manager per job, and the
>>> Job Manager is closely tied to that single job, much like Hadoop's
>>> Application Master for each job.
>>>
>>> Please let me know which one is correct.
>>>
>>> Pankaj
>>>
>>

Re: Role of Job Manager

Posted by Biao Liu <mm...@gmail.com>.

Hi Pankaj,

That's really a good question. There was a refactor of architecture
before[1]. So there might be some descriptions used the outdated concept.

Before refactoring, Job Manager is a centralized role. It controls whole
cluster and all jobs which is described in your interpretation 1.

After refactoring, the old Job Manager is separated into several roles,
Resource Manager, Dispatcher, new Job Manager, etc. The new Job Manager is
responsible for only one job, which is described in your interpretation 2.

So the document you refer to is outdated. Would you mind telling us the URL
of this document? I think we should update it to avoid misleading more
people.

1. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

Eduardo Winpenny Tejedor <ed...@gmail.com> 于2019年6月19日周三
上午1:12写道：

> Hi Pankaj,
>
> I have no experience with Hadoop but from the book I gathered there's one
> Job Manager per application i.e. per jar (as in the example in the first
> chapter). This is not to say there's one Job Manager per job. Actually I
> don't think the word Job is defined in the book, I've seen Task defined,
> and those do have Task Managers
>
> Hope this is along the right lines
>
> Regards,
> Eduardo
>
> On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <pa...@gmail.com>
> wrote:
>
>> I am trying to understand the role of Job Manager in Flink, and have come
>> across two possibly distinct interpretations.
>>
>> 1. The online documentation v1.8 signifies that there is at least one Job
>> Manager in a cluster, and it is closely tied to the cluster of machines, by
>> managing all jobs in that cluster of machines.
>>
>> This signifies that Flink's Job Manager is much like Hadoop's Application
>> Manager.
>>
>> 2. The book, "Stream Processing with Apache Flink", writes that, "The
>> Job Manager is the master process that controls the execution of a single
>> application—each application is controlled by a different Job Manager."
>>
>> This signifies that Flink defaults to one Job Manager per job, and the
>> Job Manager is closely tied to that single job, much like Hadoop's
>> Application Master for each job.
>>
>> Please let me know which one is correct.
>>
>> Pankaj
>>
>

Re: Role of Job Manager

Posted by Eduardo Winpenny Tejedor <ed...@gmail.com>.

Hi Pankaj,

I have no experience with Hadoop but from the book I gathered there's one
Job Manager per application i.e. per jar (as in the example in the first
chapter). This is not to say there's one Job Manager per job. Actually I
don't think the word Job is defined in the book, I've seen Task defined,
and those do have Task Managers

Hope this is along the right lines

Regards,
Eduardo

On Tue, 18 Jun 2019, 08:42 Pankaj Chand, <pa...@gmail.com> wrote:

> I am trying to understand the role of Job Manager in Flink, and have come
> across two possibly distinct interpretations.
>
> 1. The online documentation v1.8 signifies that there is at least one Job
> Manager in a cluster, and it is closely tied to the cluster of machines, by
> managing all jobs in that cluster of machines.
>
> This signifies that Flink's Job Manager is much like Hadoop's Application
> Manager.
>
> 2. The book, "Stream Processing with Apache Flink", writes that, "The Job
> Manager is the master process that controls the execution of a single
> application—each application is controlled by a different Job Manager."
>
> This signifies that Flink defaults to one Job Manager per job, and the Job
> Manager is closely tied to that single job, much like Hadoop's Application
> Master for each job.
>
> Please let me know which one is correct.
>
> Pankaj
>