You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by "Yuliya Feldman (JIRA)" <ji...@apache.org> on 2015/08/26 18:13:46 UTC

[jira] [Commented] (MYRIAD-128) Issue with Flex down, Pending NMs stuck in staging and don't get to active task.

    [ https://issues.apache.org/jira/browse/MYRIAD-128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14714423#comment-14714423 ] 

Yuliya Feldman commented on MYRIAD-128:
---------------------------------------

It is only part  of the problem. Another one is order of operations. We update state to running  asinchrinously  and then change to staging  - not all the time. We need to have some state transitioning scheme in addition to synchronization

> Issue with Flex down, Pending NMs stuck in staging and don't get to active task.
> --------------------------------------------------------------------------------
>
>                 Key: MYRIAD-128
>                 URL: https://issues.apache.org/jira/browse/MYRIAD-128
>             Project: Myriad
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: Myriad 0.1.0
>            Reporter: Sarjeet Singh
>         Attachments: Screen Shot 2015-08-24 at 5.51.38 PM.png
>
>
> Seeing some issue when I tried flexing NMs from Myriad UI. On flexing down active NM,  pending NMs doesn't go to active state (not sowing in 'Active Tasks') and there is no active NM showing on Myriad UI. Although, there is a NM running on the node (verified from jps). 
> mapr     20528 20526  1 17:23 ?        00:00:26 /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64/bin/java -Dproc_nodemanager -Xmx1000m -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -Dyarn.policy.file=hadoop-policy.xml -server -Dnodemanager.resource.io-spindles=4.0 -Dyarn.resourcemanager.hostname=testrm.marathon.mesos -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor -Dnodemanager.resource.cpu-vcores=0 -Dnodemanager.resource.memory-mb=0 -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003 -Dhadoop.login=maprsasl -Dhttps.protocols=TLSv1.2 -Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf -Dzookeeper.sasl.clientconfig=Client_simple -Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log -Dyarn.log.file=yarn.log -Dyarn.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 -Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console -Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -classpath /opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/nm-config/log4j.properties:/opt/mapr/lib/JPam-1.1.jar org.apache.hadoop.yarn.server.nodemanager.NodeManager
> From myriad UI:
> Active Tasks
> Killable Tasks
> Pending Tasks
> Staging Tasks
> nm.large.123badb1-57d8-4bd2-aa2e-de9fc1898c7f
> nm.medium.f2c4126c-4cb2-46af-a1e0-690034b914b8
> nm.medium.a9e9fd84-350a-48bc-bcd2-8712ecdc8c66
> nm.medium.663f9c6e-f28e-4395-8540-70c306eb04c5
> nm.medium.93f7cc91-9263-48a7-821e-3b0ffbe70e66
> This is the state even after waited for about 30 min or so after flexing down the NM.
> I tried this on a single node cluster though, but looks like the problem can happen in any case.
> I started RM from marathon and was able to get RM & Myriad up & running. With RM launched, there is a CGS (medium profile) NM is launched along with it as well which is shown as 'Active Task' on Myriad UI. Then, I launched some large profile & zero profile NM which are shown now in 'Pending tasks' since there is a (CGS default) NM already running on a single node cluster.
> Then, I tried flexing down NM from myriad UI, which flexed up the active NM and all pending NMs start moving to staging tasks, and then they stuck in staging task for longer time. On waited for about > 30min, I dont see any active task for NM and all of the pending NM tasks are shown in 'Staging task' only. (See the screenshot)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)