You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tajo.apache.org by "Hyunsik Choi (JIRA)" <ji...@apache.org> on 2013/05/08 08:29:15 UTC

[jira] [Commented] (TAJO-54) SubQuery::allocateContainers() may ask 0 containers

    [ https://issues.apache.org/jira/browse/TAJO-54?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13651656#comment-13651656 ] 

Hyunsik Choi commented on TAJO-54:
----------------------------------

Can anyone review this patch? This patch is very critical issue to avoid the hanging problem.
                
> SubQuery::allocateContainers() may ask 0 containers
> ---------------------------------------------------
>
>                 Key: TAJO-54
>                 URL: https://issues.apache.org/jira/browse/TAJO-54
>             Project: Tajo
>          Issue Type: Bug
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>            Priority: Critical
>              Labels: yarn
>             Fix For: 0.2-incubating
>
>         Attachments: TAJO-54_2.patch, TAJO-54.patch
>
>
> SubQuery::allocateContainers() calculates a number of containers to be requested for some subquery and then requests containers as follows:
> {code:title=SubQuery.java}
>     public static void allocateContainers(SubQuery subQuery) {
>       ExecutionBlock execBlock = subQuery.getBlock();
>       QueryUnit [] tasks = subQuery.getQueryUnits();
>       int numRequest = Math.min(tasks.length,
>           subQuery.context.getNumClusterNode() * 4);
> {code}
> In allocateContainers subQuery.context.getNumClusterNode() method internally invokes AMRMClient::getClusterNodeCount(). allocateContainers() requests 0 container to RM if AMRMClient::getClusterNodeCount() returns 0. If it does so, AppSchedulingInfo regards ApplicationMaster as deactive. As a result, ApplicationMaster cannot acquire any containers.
> In the current Hadoop Yarn, AMRMClient::getClusterNodeCount() temporarily returns 0 due to unknown reason even though there are available cluster nodes. This problem causes the integration test (i.e., 'mvn verify') to be hanging. This patch solves this problem by enabling RMContainerAllocator to wait for available cluster nodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira