You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hawq.apache.org by Alexey Grishchenko <pr...@gmail.com> on 2015/11/27 15:10:59 UTC
HAWQ on YARN - continuous container allocation
Hi, guys
I've got an issue with running HAWQ 2.0 on YARN
On starting HAWQ successfully registers YARN application and starts
allocating containers. But it never stops allocating them. Regardless the
amount of vcores and memory you give YARN to manage, HAWQ would allocate
containers until it eat all the available resources. After this, all the
queries start to hang.
What I can see in the RM logs (full log is attached):
2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id.
PERMISSIONS=Unauthorized access or invalid container
APPID=application_1448630699339_0002
CONTAINERID=container_1448630699339_0002_01_000008
Do you know the possible reason for this?
Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0 (YARN
2.7.1)
--
Alexey Grishchenko, http://0x0fff.com
Re: HAWQ on YARN - continuous container allocation
Posted by Alexey Grishchenko <pr...@gmail.com>.
Ok, now I see that the container name is formed in a wrong way.
2015-11-27 06:18:02,991 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
*container_e08_1448630699339_0003_01_000003* Container Transitioned from
NEW to ALLOCATED
2015-11-27 06:18:02,991 INFO resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated
Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1448630699339_0003
CONTAINERID=container_e08_1448630699339_0003_01_000003
2015-11-27 06:18:02,991 INFO scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e08_1448630699339_0003_01_000003 of capacity <memory:512,
vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers,
<memory:1536, vCores:3> used and <memory:4309, vCores:13> available after
allocation
2015-11-27 06:18:02,991 INFO capacity.LeafQueue
(LeafQueue.java:assignContainer(1616)) - assignedContainer application
attempt=appattempt_1448630699339_0003_000001 container=Container:
[ContainerId: container_e08_1448630699339_0003_01_000003, NodeId:
hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource:
<memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>,
usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1,
numContainers=2 clusterResource=<memory:5845, vCores:16>
2015-11-27 06:18:03,850 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
container_e08_1448630699339_0003_01_000003 Container Transitioned from
ALLOCATED to ACQUIRED
2015-11-27 06:18:03,877 WARN resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128
OPERATION=AM
Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id. PERMISSIONS=Unauthorized
access or invalid container APPID=application_1448630699339_0003
CONTAINERID=*container_1448630699339_0003_01_000003*
Container is allocated with the name
container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it
down it uses the name container_1448630699339_0003_01_000003, which is
wrong because there is no container with this name
On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:
> You can also find HAWQ log in attachment. As expected, HAWQ tries to
> releaseResources, but somehow it returns success
>
> On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <
> programmerag@gmail.com> wrote:
>
>> Hi, guys
>>
>> I've got an issue with running HAWQ 2.0 on YARN
>> On starting HAWQ successfully registers YARN application and starts
>> allocating containers. But it never stops allocating them. Regardless the
>> amount of vcores and memory you give YARN to manage, HAWQ would allocate
>> containers until it eat all the available resources. After this, all the
>> queries start to hang.
>>
>> What I can see in the RM logs (full log is attached):
>> 2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger
>> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
>> Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
>> release container not owned by app or with invalid id.
>> PERMISSIONS=Unauthorized access or invalid container
>> APPID=application_1448630699339_0002
>> CONTAINERID=container_1448630699339_0002_01_000008
>>
>> Do you know the possible reason for this?
>> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
>> (YARN 2.7.1)
>>
>> --
>> Alexey Grishchenko, http://0x0fff.com
>>
>
>
>
> --
> Alexey Grishchenko, http://0x0fff.com
>
--
Alexey Grishchenko, http://0x0fff.com
Re: HAWQ on YARN - continuous container allocation
Posted by Alexey Grishchenko <pr...@gmail.com>.
Ok, now I see that the container name is formed in a wrong way.
2015-11-27 06:18:02,991 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
*container_e08_1448630699339_0003_01_000003* Container Transitioned from
NEW to ALLOCATED
2015-11-27 06:18:02,991 INFO resourcemanager.RMAuditLogger
(RMAuditLogger.java:logSuccess(106)) - USER=gpadmin OPERATION=AM Allocated
Container TARGET=SchedulerApp RESULT=SUCCESS
APPID=application_1448630699339_0003
CONTAINERID=container_e08_1448630699339_0003_01_000003
2015-11-27 06:18:02,991 INFO scheduler.SchedulerNode
(SchedulerNode.java:allocateContainer(154)) - Assigned container
container_e08_1448630699339_0003_01_000003 of capacity <memory:512,
vCores:1> on host hawq20.pivotal.io:45454, which has 3 containers,
<memory:1536, vCores:3> used and <memory:4309, vCores:13> available after
allocation
2015-11-27 06:18:02,991 INFO capacity.LeafQueue
(LeafQueue.java:assignContainer(1616)) - assignedContainer application
attempt=appattempt_1448630699339_0003_000001 container=Container:
[ContainerId: container_e08_1448630699339_0003_01_000003, NodeId:
hawq20.pivotal.io:45454, NodeHttpAddress: hawq20.pivotal.io:8042, Resource:
<memory:512, vCores:1>, Priority: 1, Token: null, ] queue=default:
capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:1024, vCores:2>,
usedCapacity=0.17519248, absoluteUsedCapacity=0.17519248, numApps=1,
numContainers=2 clusterResource=<memory:5845, vCores:16>
2015-11-27 06:18:03,850 INFO rmcontainer.RMContainerImpl
(RMContainerImpl.java:handle(417)) -
container_e08_1448630699339_0003_01_000003 Container Transitioned from
ALLOCATED to ACQUIRED
2015-11-27 06:18:03,877 WARN resourcemanager.RMAuditLogger
(RMAuditLogger.java:logFailure(215)) - USER=gpadmin IP=192.168.220.128
OPERATION=AM
Released Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
release container not owned by app or with invalid id. PERMISSIONS=Unauthorized
access or invalid container APPID=application_1448630699339_0003
CONTAINERID=*container_1448630699339_0003_01_000003*
Container is allocated with the name
container_e08_1448630699339_0003_01_000003, but when HAWQ tries to shut it
down it uses the name container_1448630699339_0003_01_000003, which is
wrong because there is no container with this name
On Fri, Nov 27, 2015 at 2:29 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:
> You can also find HAWQ log in attachment. As expected, HAWQ tries to
> releaseResources, but somehow it returns success
>
> On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <
> programmerag@gmail.com> wrote:
>
>> Hi, guys
>>
>> I've got an issue with running HAWQ 2.0 on YARN
>> On starting HAWQ successfully registers YARN application and starts
>> allocating containers. But it never stops allocating them. Regardless the
>> amount of vcores and memory you give YARN to manage, HAWQ would allocate
>> containers until it eat all the available resources. After this, all the
>> queries start to hang.
>>
>> What I can see in the RM logs (full log is attached):
>> 2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger
>> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
>> Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
>> release container not owned by app or with invalid id.
>> PERMISSIONS=Unauthorized access or invalid container
>> APPID=application_1448630699339_0002
>> CONTAINERID=container_1448630699339_0002_01_000008
>>
>> Do you know the possible reason for this?
>> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
>> (YARN 2.7.1)
>>
>> --
>> Alexey Grishchenko, http://0x0fff.com
>>
>
>
>
> --
> Alexey Grishchenko, http://0x0fff.com
>
--
Alexey Grishchenko, http://0x0fff.com
Re: HAWQ on YARN - continuous container allocation
Posted by Alexey Grishchenko <pr...@gmail.com>.
You can also find HAWQ log in attachment. As expected, HAWQ tries to
releaseResources, but somehow it returns success
On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:
> Hi, guys
>
> I've got an issue with running HAWQ 2.0 on YARN
> On starting HAWQ successfully registers YARN application and starts
> allocating containers. But it never stops allocating them. Regardless the
> amount of vcores and memory you give YARN to manage, HAWQ would allocate
> containers until it eat all the available resources. After this, all the
> queries start to hang.
>
> What I can see in the RM logs (full log is attached):
> 2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger
> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
> Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
> release container not owned by app or with invalid id.
> PERMISSIONS=Unauthorized access or invalid container
> APPID=application_1448630699339_0002
> CONTAINERID=container_1448630699339_0002_01_000008
>
> Do you know the possible reason for this?
> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
> (YARN 2.7.1)
>
> --
> Alexey Grishchenko, http://0x0fff.com
>
--
Alexey Grishchenko, http://0x0fff.com
Re: HAWQ on YARN - continuous container allocation
Posted by Alexey Grishchenko <pr...@gmail.com>.
You can also find HAWQ log in attachment. As expected, HAWQ tries to
releaseResources, but somehow it returns success
On Fri, Nov 27, 2015 at 2:10 PM, Alexey Grishchenko <pr...@gmail.com>
wrote:
> Hi, guys
>
> I've got an issue with running HAWQ 2.0 on YARN
> On starting HAWQ successfully registers YARN application and starts
> allocating containers. But it never stops allocating them. Regardless the
> amount of vcores and memory you give YARN to manage, HAWQ would allocate
> containers until it eat all the available resources. After this, all the
> queries start to hang.
>
> What I can see in the RM logs (full log is attached):
> 2015-11-27 05:34:59,214 WARN resourcemanager.RMAuditLogger
> (RMAuditLogger.java:logFailure(215)) - USER=gpadmin OPERATION=AM Released
> Container TARGET=Scheduler RESULT=FAILURE DESCRIPTION=Trying to
> release container not owned by app or with invalid id.
> PERMISSIONS=Unauthorized access or invalid container
> APPID=application_1448630699339_0002
> CONTAINERID=container_1448630699339_0002_01_000008
>
> Do you know the possible reason for this?
> Using HAWQ 2.0.0.0_beta build 18453 on a single node with PHD 3.3.2.0
> (YARN 2.7.1)
>
> --
> Alexey Grishchenko, http://0x0fff.com
>
--
Alexey Grishchenko, http://0x0fff.com