You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Vitaliy Semochkin <vi...@gmail.com> on 2020/03/23 22:45:08 UTC

ClusterSpecification and Configuration questions

Hi,

I create a job with following parameters:
org.apache.flink.configuration.Configuration{
yarn.containers.vcores=2
yarn.appmaster.vcores=1
}

ClusterSpecification{
taskManagerMemoryMB=1024
slotsPerTaskManager=1
}
After I launch job programmatically I have :
yarn node -list -showDetails
Configured Resources : <memory:8192, vCores:8>
Allocated Resources : <memory:1250, vCores:1> - I suppose this was created
for JobManager

But in logs I see 3 requests to create Requesting new TaskExecutor
container with resources <memory:2048, vCores:2>

Here is a log fragment:
 JobManager successfully registered at ResourceManager, leader id:
00000000000000000000000000000000.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 1.
 org.apache.flink.yarn.YarnResourceManager                     - Request
slot with profile ResourceProfile{UNKNOWN} for job
64080d7889797133215e501e72b23a74 with allocation id
a1c9ff2b7ec9ad662108b8a2b2301fcf.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 2.
 org.apache.flink.yarn.YarnResourceManager                     - Request
slot with profile ResourceProfile{UNKNOWN} for job
64080d7889797133215e501e72b23a74 with allocation id
21f57b4324bdd50dd293547bc4b19ce2.
 org.apache.flink.yarn.YarnResourceManager                     - Requesting
new TaskExecutor container with resources <memory:2048, vCores:2>. Number
pending requests 3.
Close ResourceManager connection
Shut down cluster because application is in FAILED, diagnostics null.

Here are things I would like to clarify:
Why there are 3 requests to create TaskExecutor instead of 1?
Why no task executor is created despite I have 7 cores and 7 GB  of free
RAM?
What is ResourceProfile{UNKNOWN}?
What is diagnostic null?

When I change number ClusterSpecification.slotsPerTaskManager to 1 - I get :
"Cannot serve slot request, no ResourceManager connected"
"Could not allocate the required slot within slot request timeout. Please
make sure that the cluster has enough resources"
Why ResourceManager aint created despite I request even even less resource
for this?


Regards,
Vitaliy

Re: ClusterSpecification and Configuration questions

Posted by Xintong Song <to...@gmail.com>.

Hi Vitaliy,

Do you mean you are modifying the code of ClusterSpecification? I believe
this is an internal class and is not meant to be modified by users.
Changing the internal code directly might lead to internal inconsistency
and unpredictable problems. If you want to modify JM/TM memory and slots
per TM, please use the configuration options.

I think the major problem in your case is that the TaskExecutor cannot be
started. Would you mind to post the complete log file? That should be
helpful for people to understand what has caused the problem. The posted
log fragments are not very helpful to that end.

In addition, would you be able to check the Yarn logs? See if the container
requests are received and containers are allocated.

Thank you~

Xintong Song



On Tue, Mar 24, 2020 at 6:45 AM Vitaliy Semochkin <vi...@gmail.com>
wrote:

> Hi,
>
> I create a job with following parameters:
> org.apache.flink.configuration.Configuration{
> yarn.containers.vcores=2
> yarn.appmaster.vcores=1
> }
>
> ClusterSpecification{
> taskManagerMemoryMB=1024
> slotsPerTaskManager=1
> }
> After I launch job programmatically I have :
> yarn node -list -showDetails
> Configured Resources : <memory:8192, vCores:8>
> Allocated Resources : <memory:1250, vCores:1> - I suppose this was created
> for JobManager
>
> But in logs I see 3 requests to create Requesting new TaskExecutor
> container with resources <memory:2048, vCores:2>
>
> Here is a log fragment:
>  JobManager successfully registered at ResourceManager, leader id:
> 00000000000000000000000000000000.
>  org.apache.flink.yarn.YarnResourceManager                     -
> Requesting new TaskExecutor container with resources <memory:2048,
> vCores:2>. Number pending requests 1.
>  org.apache.flink.yarn.YarnResourceManager                     - Request
> slot with profile ResourceProfile{UNKNOWN} for job
> 64080d7889797133215e501e72b23a74 with allocation id
> a1c9ff2b7ec9ad662108b8a2b2301fcf.
>  org.apache.flink.yarn.YarnResourceManager                     -
> Requesting new TaskExecutor container with resources <memory:2048,
> vCores:2>. Number pending requests 2.
>  org.apache.flink.yarn.YarnResourceManager                     - Request
> slot with profile ResourceProfile{UNKNOWN} for job
> 64080d7889797133215e501e72b23a74 with allocation id
> 21f57b4324bdd50dd293547bc4b19ce2.
>  org.apache.flink.yarn.YarnResourceManager                     -
> Requesting new TaskExecutor container with resources <memory:2048,
> vCores:2>. Number pending requests 3.
> Close ResourceManager connection
> Shut down cluster because application is in FAILED, diagnostics null.
>
> Here are things I would like to clarify:
> Why there are 3 requests to create TaskExecutor instead of 1?
> Why no task executor is created despite I have 7 cores and 7 GB  of free
> RAM?
> What is ResourceProfile{UNKNOWN}?
> What is diagnostic null?
>
> When I change number ClusterSpecification.slotsPerTaskManager to 1 - I get
> :
> "Cannot serve slot request, no ResourceManager connected"
> "Could not allocate the required slot within slot request timeout. Please
> make sure that the cluster has enough resources"
> Why ResourceManager aint created despite I request even even less resource
> for this?
>
>
> Regards,
> Vitaliy
>
>
>