You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Alexey Grishchenko <pr...@gmail.com> on 2015/11/27 15:10:59 UTC

HAWQ on YARN - continuous container allocation

Hi, guys

I've got an issue with running HAWQ 2.0 on YARN
On starting HAWQ successfully registers YARN application and starts
allocating containers. But it never stops allocating them. Regardless the
amount of vcores and memory you give YARN to manage, HAWQ would allocate
containers until it eat all the available resources. After this, all the
queries start to hang.

What I can see in the RM logs (full log is attached):
2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
release container not owned by app or with invalid id.
 PERMISSIONS=Unauthorized access or invalid container
 APPID=application_1448630699339_0002
 CONTAINERID=container_1448630699339_0002_01_000008

Do you know the possible reason for this?
Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0 (YARN
2.7.1)

-- 
Alexey Grishchenko, http://0x0fff.com

Re: HAWQ on YARN - continuous container allocation

Posted by Alexey Grishchenko <pr...@gmail.com>.
Ok, now I see that the container name is formed in a wrong way.

2015-11-27 06:18:02,991 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
*container_e08_1448630699339_0003_01_000003* Container Transitioned from
NEW to ALLOCATED
2015-11-27 06:18:02,991 INFO  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated
Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1448630699339_0003
CONTAINERID=container_e08_1448630699339_0003_01_000003
2015-11-27 06:18:02,991 INFO  scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e08_1448630699339_0003_01_000003 of capacity <memory:512,
vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers,
<memory:1536, vCores:3> used and <memory:4309, vCores:13> available after
allocation
2015-11-27 06:18:02,991 INFO  capacity.LeafQueue
(LeafQueue.java:assignContainer(1616)) - assignedContainer application
attempt=appattempt_1448630699339_0003_000001 container=Container:
[ContainerId: container_e08_1448630699339_0003_01_000003, NodeId:
hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource:
<memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>,
usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1,
numContainers=2 clusterResource=<memory:5845, vCores:16>
2015-11-27 06:18:03,850 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
container_e08_1448630699339_0003_01_000003 Container Transitioned from
ALLOCATED to ACQUIRED
2015-11-27 06:18:03,877 WARN  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128
OPERATION=AM
Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id. PERMISSIONS=Unauthorized
access or invalid container APPID=application_1448630699339_0003
CONTAINERID=*container_1448630699339_0003_01_000003*

Container is allocated with the name
container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it
down it uses the name container_1448630699339_0003_01_000003, which is
wrong because there is no container with this name


On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:

> You can also find HAWQ log in attachment. As expected, HAWQ tries to
> releaseResources, but somehow it returns success
>
> On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <
> programmerag@gmail.com> wrote:
>
>> Hi, guys
>>
>> I've got an issue with running HAWQ 2.0 on YARN
>> On starting HAWQ successfully registers YARN application and starts
>> allocating containers. But it never stops allocating them. Regardless the
>> amount of vcores and memory you give YARN to manage, HAWQ would allocate
>> containers until it eat all the available resources. After this, all the
>> queries start to hang.
>>
>> What I can see in the RM logs (full log is attached):
>> 2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
>> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
>> Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
>> release container not owned by app or with invalid id.
>>  PERMISSIONS=Unauthorized access or invalid container
>>  APPID=application_1448630699339_0002
>>  CONTAINERID=container_1448630699339_0002_01_000008
>>
>> Do you know the possible reason for this?
>> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
>> (YARN 2.7.1)
>>
>> --
>> Alexey Grishchenko, http://0x0fff.com
>>
>
>
>
> --
> Alexey Grishchenko, http://0x0fff.com
>



-- 
Alexey Grishchenko, http://0x0fff.com

Re: HAWQ on YARN - continuous container allocation

Posted by Alexey Grishchenko <pr...@gmail.com>.
Ok, now I see that the container name is formed in a wrong way.

2015-11-27 06:18:02,991 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
*container_e08_1448630699339_0003_01_000003* Container Transitioned from
NEW to ALLOCATED
2015-11-27 06:18:02,991 INFO  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated
Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1448630699339_0003
CONTAINERID=container_e08_1448630699339_0003_01_000003
2015-11-27 06:18:02,991 INFO  scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e08_1448630699339_0003_01_000003 of capacity <memory:512,
vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers,
<memory:1536, vCores:3> used and <memory:4309, vCores:13> available after
allocation
2015-11-27 06:18:02,991 INFO  capacity.LeafQueue
(LeafQueue.java:assignContainer(1616)) - assignedContainer application
attempt=appattempt_1448630699339_0003_000001 container=Container:
[ContainerId: container_e08_1448630699339_0003_01_000003, NodeId:
hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource:
<memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>,
usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1,
numContainers=2 clusterResource=<memory:5845, vCores:16>
2015-11-27 06:18:03,850 INFO  rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
container_e08_1448630699339_0003_01_000003 Container Transitioned from
ALLOCATED to ACQUIRED
2015-11-27 06:18:03,877 WARN  resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128
OPERATION=AM
Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id. PERMISSIONS=Unauthorized
access or invalid container APPID=application_1448630699339_0003
CONTAINERID=*container_1448630699339_0003_01_000003*

Container is allocated with the name
container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it
down it uses the name container_1448630699339_0003_01_000003, which is
wrong because there is no container with this name


On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:

> You can also find HAWQ log in attachment. As expected, HAWQ tries to
> releaseResources, but somehow it returns success
>
> On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <
> programmerag@gmail.com> wrote:
>
>> Hi, guys
>>
>> I've got an issue with running HAWQ 2.0 on YARN
>> On starting HAWQ successfully registers YARN application and starts
>> allocating containers. But it never stops allocating them. Regardless the
>> amount of vcores and memory you give YARN to manage, HAWQ would allocate
>> containers until it eat all the available resources. After this, all the
>> queries start to hang.
>>
>> What I can see in the RM logs (full log is attached):
>> 2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
>> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
>> Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
>> release container not owned by app or with invalid id.
>>  PERMISSIONS=Unauthorized access or invalid container
>>  APPID=application_1448630699339_0002
>>  CONTAINERID=container_1448630699339_0002_01_000008
>>
>> Do you know the possible reason for this?
>> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
>> (YARN 2.7.1)
>>
>> --
>> Alexey Grishchenko, http://0x0fff.com
>>
>
>
>
> --
> Alexey Grishchenko, http://0x0fff.com
>



-- 
Alexey Grishchenko, http://0x0fff.com

Re: HAWQ on YARN - continuous container allocation

Posted by Alexey Grishchenko <pr...@gmail.com>.
You can also find HAWQ log in attachment. As expected, HAWQ tries to
releaseResources, but somehow it returns success

On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:

> Hi, guys
>
> I've got an issue with running HAWQ 2.0 on YARN
> On starting HAWQ successfully registers YARN application and starts
> allocating containers. But it never stops allocating them. Regardless the
> amount of vcores and memory you give YARN to manage, HAWQ would allocate
> containers until it eat all the available resources. After this, all the
> queries start to hang.
>
> What I can see in the RM logs (full log is attached):
> 2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
> Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
> release container not owned by app or with invalid id.
>  PERMISSIONS=Unauthorized access or invalid container
>  APPID=application_1448630699339_0002
>  CONTAINERID=container_1448630699339_0002_01_000008
>
> Do you know the possible reason for this?
> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
> (YARN 2.7.1)
>
> --
> Alexey Grishchenko, http://0x0fff.com
>



-- 
Alexey Grishchenko, http://0x0fff.com

Re: HAWQ on YARN - continuous container allocation

Posted by Alexey Grishchenko <pr...@gmail.com>.
You can also find HAWQ log in attachment. As expected, HAWQ tries to
releaseResources, but somehow it returns success

On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:

> Hi, guys
>
> I've got an issue with running HAWQ 2.0 on YARN
> On starting HAWQ successfully registers YARN application and starts
> allocating containers. But it never stops allocating them. Regardless the
> amount of vcores and memory you give YARN to manage, HAWQ would allocate
> containers until it eat all the available resources. After this, all the
> queries start to hang.
>
> What I can see in the RM logs (full log is attached):
> 2015-11-27 05:34:59,214 WARN  resourcemanager.RMAuditLogger
> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
> Container TARGET=Scheduler        RESULT=FAILURE  DESCRIPTION=Trying to
> release container not owned by app or with invalid id.
>  PERMISSIONS=Unauthorized access or invalid container
>  APPID=application_1448630699339_0002
>  CONTAINERID=container_1448630699339_0002_01_000008
>
> Do you know the possible reason for this?
> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
> (YARN 2.7.1)
>
> --
> Alexey Grishchenko, http://0x0fff.com
>



-- 
Alexey Grishchenko, http://0x0fff.com