You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@myriad.apache.org by Björn Hagemeier <b....@fz-juelich.de> on 2016/02/19 11:01:35 UTC

Pending Flex Up Tasks

Hi all,

I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
with pending flex up tasks, for which I cannot see any further
information. Thus, I do not even have the faintest idea where to start
debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
are a different issue.

My idea was to use package installations of NM on the slave nodes, if
that is possible (?). The documentation mentions sth. about remote
distribution, but from the wording it seems to be more of an option than
a requirement. Packages have been installed and customized (myriad
configuration) via Puppet and I'd very much like to stay with this.

Anyhow, please let me know the possible causes of flex up tasks
remaining in the pending state. I can easily flex them down and thus
remove the pending tasks, but I never get them into active.

Any hint is very much appreciated.


Best regards,
Björn
-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hagemeier@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------


Re: Pending Flex Up Tasks

Posted by Santosh Marella <sm...@maprtech.com>.
Hi Björn,

  Does RM log indicate that it's receiving offers from Mesos? RM log should
show something like below. Does it happen on your cluster?

16/02/17 16:22:57 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:02 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:03 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:08 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:09 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:10 INFO api.ClustersResource: Received flexup request.
Profile: zero, Instances: 1, Constraints: null
16/02/17 16:23:10 INFO scheduler.MyriadOperations: Adding 1 NM instances to
cluster
16/02/17 16:23:10 INFO state.SchedulerState: Marked taskId
nm.zero.f015999c-2f1b-493b-a74a-79a5af678e73 pending, size of pending queue
for nm is: 0
16/02/17 16:23:14 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:15 INFO handlers.ResourceOffersEventHandler: Received offers
1
16/02/17 16:23:15 INFO handlers.ResourceOffersEventHandler: Launching task:
nm.zero.f015999c-2f1b-493b-a74a-79a5af678e73 using offer: value:
"37d65647-e66a-42d5-a7e5-b05a64f9dab0-O806"
16/02/17 16:23:18 INFO handlers.StatusUpdateEventHandler: Status Update for
task: nm.zero.f015999c-2f1b-493b-a74a-79a5af678e73 | state: TASK_RUNNING


Santosh

On Mon, Feb 22, 2016 at 10:27 AM, Björn Hagemeier <b.hagemeier@fz-juelich.de
> wrote:

> Hi Yuliya,
>
> Am 22.02.2016 um 17:54 schrieb yuliya Feldman:
> > When you say:
> >>>> I do see all my expected 8 slaves with all their cores, RAM,
> >>>> and disk.
> > Do you see it on Mesos Master main UI?
> yes, that is on the Mesos Master main UI.
> What are those per slave -
> meaning how many CPUs and how much RAM?
> On the slaves page I see 8 slaves, each with 24 CPUs, 93.2GB RAM, and
> 224.1GB disk registered 4 days ago. No re-registration.
>
>
> Best regards,
> Björn
> >
> >       From: Björn Hagemeier <b....@fz-juelich.de>
> >  To: dev@myriad.incubator.apache.org
> >  Sent: Monday, February 22, 2016 12:35 AM
> >  Subject: Re: Pending Flex Up Tasks
> >
> > Dear Yuliya,
> >
> > thank you for the warm welcome.
> >
> > Am 19.02.2016 um 18:22 schrieb yuliya Feldman:
> >> Hello Bjorn,
> >> Welcome to Myriad.
> >> Few questions that could help to help you.
> >> 1. Do you just have pending tasks and you never get any active
> >> ones?
> > There are only pending tasks, never any active ones. The RM log does not
> > seem very helpful to me. It records the flexup/flexdown requests and
> > also the actual killing of flexdown tasks including the fact that I
> > requested to flexdown more instances than are pending. This is all in
> > line with what I'd expected from my actions.
> >
> >> 2. Do you have some active and some pending tasks?3. Are your pending
> tasks
> > all NMs?
> >> If [1] - could you look at RM log (should be accessible from Mesos
> > console) whether there is enough resources to start the tasks - because
> > if not they will remain pending.
> > I do not see any available resources in the Mesos log. This is something
> > I also noticed in the Mesos Web interface, which does not list any
> > outstanding offers. Is this related?
> >
> > I do see all my expected 8 slaves with all their cores, RAM, and disk.
> >
> > If [2] - we start one NM per hostname
> > per Myriad framework, so second NM will not start unless first one goes
> away
> >> In any case it is best to look at RM log for clues.
> > Thank you for this hint. It may be worth knowing once I've made it past
> [1].
> >
> >
> > Best regards,
> > Björn
> >> Thanks,Yuliya
> >>
> >>       From: Björn Hagemeier <b....@fz-juelich.de>
> >>   To: Myriad Dev <de...@myriad.incubator.apache.org>
> >>   Sent: Friday, February 19, 2016 2:01 AM
> >>   Subject: Pending Flex Up Tasks
> >>
> >> Hi all,
> >>
> >> I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
> >> with pending flex up tasks, for which I cannot see any further
> >> information. Thus, I do not even have the faintest idea where to start
> >> debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
> >> are a different issue.
> >>
> >> My idea was to use package installations of NM on the slave nodes, if
> >> that is possible (?). The documentation mentions sth. about remote
> >> distribution, but from the wording it seems to be more of an option than
> >> a requirement. Packages have been installed and customized (myriad
> >> configuration) via Puppet and I'd very much like to stay with this.
> >>
> >> Anyhow, please let me know the possible causes of flex up tasks
> >> remaining in the pending state. I can easily flex them down and thus
> >> remove the pending tasks, but I never get them into active.
> >>
> >> Any hint is very much appreciated.
> >>
> >>
> >> Best regards,
> >> Björn
> >>
> >
> >
>
>
> --
> Dipl.-Inform. Björn Hagemeier
> Federated Systems and Data
> Juelich Supercomputing Centre
> Institute for Advanced Simulation
>
> Phone: +49 2461 61 1584
> Fax  : +49 2461 61 6656
> Email: b.hagemeier@fz-juelich.de
> Skype: bhagemeier
> WWW  : http://www.fz-juelich.de/jsc
>
> JSC is the coordinator of the
> John von Neumann Institute for Computing
> and member of the
> Gauss Centre for Supercomputing
>
>
> -------------------------------------------------------------------------------------
>
> -------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
> -------------------------------------------------------------------------------------
>
> -------------------------------------------------------------------------------------
>
>

Re: Pending Flex Up Tasks

Posted by Björn Hagemeier <b....@fz-juelich.de>.
Hi Yuliya,

Am 22.02.2016 um 17:54 schrieb yuliya Feldman:
> When you say:
>>>> I do see all my expected 8 slaves with all their cores, RAM,
>>>> and disk.
> Do you see it on Mesos Master main UI?
yes, that is on the Mesos Master main UI.
What are those per slave -
meaning how many CPUs and how much RAM?
On the slaves page I see 8 slaves, each with 24 CPUs, 93.2GB RAM, and
224.1GB disk registered 4 days ago. No re-registration.


Best regards,
Björn
> 
>       From: Björn Hagemeier <b....@fz-juelich.de>
>  To: dev@myriad.incubator.apache.org 
>  Sent: Monday, February 22, 2016 12:35 AM
>  Subject: Re: Pending Flex Up Tasks
>    
> Dear Yuliya,
> 
> thank you for the warm welcome.
> 
> Am 19.02.2016 um 18:22 schrieb yuliya Feldman:
>> Hello Bjorn,
>> Welcome to Myriad.
>> Few questions that could help to help you.
>> 1. Do you just have pending tasks and you never get any active
>> ones?
> There are only pending tasks, never any active ones. The RM log does not
> seem very helpful to me. It records the flexup/flexdown requests and
> also the actual killing of flexdown tasks including the fact that I
> requested to flexdown more instances than are pending. This is all in
> line with what I'd expected from my actions.
> 
>> 2. Do you have some active and some pending tasks?3. Are your pending tasks
> all NMs?
>> If [1] - could you look at RM log (should be accessible from Mesos
> console) whether there is enough resources to start the tasks - because
> if not they will remain pending.
> I do not see any available resources in the Mesos log. This is something
> I also noticed in the Mesos Web interface, which does not list any
> outstanding offers. Is this related?
> 
> I do see all my expected 8 slaves with all their cores, RAM, and disk.
> 
> If [2] - we start one NM per hostname
> per Myriad framework, so second NM will not start unless first one goes away
>> In any case it is best to look at RM log for clues.
> Thank you for this hint. It may be worth knowing once I've made it past [1].
> 
> 
> Best regards,
> Björn
>> Thanks,Yuliya
>>
>>       From: Björn Hagemeier <b....@fz-juelich.de>
>>   To: Myriad Dev <de...@myriad.incubator.apache.org> 
>>   Sent: Friday, February 19, 2016 2:01 AM
>>   Subject: Pending Flex Up Tasks
>>     
>> Hi all,
>>
>> I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
>> with pending flex up tasks, for which I cannot see any further
>> information. Thus, I do not even have the faintest idea where to start
>> debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
>> are a different issue.
>>
>> My idea was to use package installations of NM on the slave nodes, if
>> that is possible (?). The documentation mentions sth. about remote
>> distribution, but from the wording it seems to be more of an option than
>> a requirement. Packages have been installed and customized (myriad
>> configuration) via Puppet and I'd very much like to stay with this.
>>
>> Anyhow, please let me know the possible causes of flex up tasks
>> remaining in the pending state. I can easily flex them down and thus
>> remove the pending tasks, but I never get them into active.
>>
>> Any hint is very much appreciated.
>>
>>
>> Best regards,
>> Björn
>>
> 
> 


-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hagemeier@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------


Re: Pending Flex Up Tasks

Posted by yuliya Feldman <yu...@yahoo.com.INVALID>.
When you say:
>>> I do see all my expected 8 slaves with all their cores, RAM, and disk.
Do you see it on Mesos Master main UI?  What are those per slave - meaning how many CPUs and how much RAM?

      From: Björn Hagemeier <b....@fz-juelich.de>
 To: dev@myriad.incubator.apache.org 
 Sent: Monday, February 22, 2016 12:35 AM
 Subject: Re: Pending Flex Up Tasks
   
Dear Yuliya,

thank you for the warm welcome.

Am 19.02.2016 um 18:22 schrieb yuliya Feldman:
> Hello Bjorn,
> Welcome to Myriad.
> Few questions that could help to help you.
> 1. Do you just have pending tasks and you never get any active
> ones?
There are only pending tasks, never any active ones. The RM log does not
seem very helpful to me. It records the flexup/flexdown requests and
also the actual killing of flexdown tasks including the fact that I
requested to flexdown more instances than are pending. This is all in
line with what I'd expected from my actions.

> 2. Do you have some active and some pending tasks?3. Are your pending tasks
all NMs?
> If [1] - could you look at RM log (should be accessible from Mesos
console) whether there is enough resources to start the tasks - because
if not they will remain pending.
I do not see any available resources in the Mesos log. This is something
I also noticed in the Mesos Web interface, which does not list any
outstanding offers. Is this related?

I do see all my expected 8 slaves with all their cores, RAM, and disk.

If [2] - we start one NM per hostname
per Myriad framework, so second NM will not start unless first one goes away
> In any case it is best to look at RM log for clues.
Thank you for this hint. It may be worth knowing once I've made it past [1].


Best regards,
Björn
> Thanks,Yuliya
> 
>      From: Björn Hagemeier <b....@fz-juelich.de>
>  To: Myriad Dev <de...@myriad.incubator.apache.org> 
>  Sent: Friday, February 19, 2016 2:01 AM
>  Subject: Pending Flex Up Tasks
>    
> Hi all,
> 
> I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
> with pending flex up tasks, for which I cannot see any further
> information. Thus, I do not even have the faintest idea where to start
> debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
> are a different issue.
> 
> My idea was to use package installations of NM on the slave nodes, if
> that is possible (?). The documentation mentions sth. about remote
> distribution, but from the wording it seems to be more of an option than
> a requirement. Packages have been installed and customized (myriad
> configuration) via Puppet and I'd very much like to stay with this.
> 
> Anyhow, please let me know the possible causes of flex up tasks
> remaining in the pending state. I can easily flex them down and thus
> remove the pending tasks, but I never get them into active.
> 
> Any hint is very much appreciated.
> 
> 
> Best regards,
> Björn
> 


-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hagemeier@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------


  

Re: Pending Flex Up Tasks

Posted by Björn Hagemeier <b....@fz-juelich.de>.
Dear Yuliya,

thank you for the warm welcome.

Am 19.02.2016 um 18:22 schrieb yuliya Feldman:
> Hello Bjorn,
> Welcome to Myriad.
> Few questions that could help to help you.
> 1. Do you just have pending tasks and you never get any active
> ones?
There are only pending tasks, never any active ones. The RM log does not
seem very helpful to me. It records the flexup/flexdown requests and
also the actual killing of flexdown tasks including the fact that I
requested to flexdown more instances than are pending. This is all in
line with what I'd expected from my actions.

> 2. Do you have some active and some pending tasks?3. Are your pending tasks
all NMs?
> If [1] - could you look at RM log (should be accessible from Mesos
console) whether there is enough resources to start the tasks - because
if not they will remain pending.
I do not see any available resources in the Mesos log. This is something
I also noticed in the Mesos Web interface, which does not list any
outstanding offers. Is this related?

I do see all my expected 8 slaves with all their cores, RAM, and disk.

If [2] - we start one NM per hostname
per Myriad framework, so second NM will not start unless first one goes away
> In any case it is best to look at RM log for clues.
Thank you for this hint. It may be worth knowing once I've made it past [1].


Best regards,
Björn
> Thanks,Yuliya
> 
>       From: Björn Hagemeier <b....@fz-juelich.de>
>  To: Myriad Dev <de...@myriad.incubator.apache.org> 
>  Sent: Friday, February 19, 2016 2:01 AM
>  Subject: Pending Flex Up Tasks
>    
> Hi all,
> 
> I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
> with pending flex up tasks, for which I cannot see any further
> information. Thus, I do not even have the faintest idea where to start
> debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
> are a different issue.
> 
> My idea was to use package installations of NM on the slave nodes, if
> that is possible (?). The documentation mentions sth. about remote
> distribution, but from the wording it seems to be more of an option than
> a requirement. Packages have been installed and customized (myriad
> configuration) via Puppet and I'd very much like to stay with this.
> 
> Anyhow, please let me know the possible causes of flex up tasks
> remaining in the pending state. I can easily flex them down and thus
> remove the pending tasks, but I never get them into active.
> 
> Any hint is very much appreciated.
> 
> 
> Best regards,
> Björn
> 


-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hagemeier@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------


Re: Pending Flex Up Tasks

Posted by yuliya Feldman <yu...@yahoo.com.INVALID>.
Hello Bjorn,
Welcome to Myriad.
Few questions that could help to help you.
1. Do you just have pending tasks and you never get any active ones?2. Do you have some active and some pending tasks?3. Are your pending tasks all NMs?
If [1] - could you look at RM log (should be accessible from Mesos console) whether there is enough resources to start the tasks - because if not they will remain pending.If [2] - we start one NM per hostname per Myriad framework, so second NM will not start unless first one goes away
In any case it is best to look at RM log for clues.
Thanks,Yuliya

      From: Björn Hagemeier <b....@fz-juelich.de>
 To: Myriad Dev <de...@myriad.incubator.apache.org> 
 Sent: Friday, February 19, 2016 2:01 AM
 Subject: Pending Flex Up Tasks
   
Hi all,

I am very new to Myriad, but also to Mesos and Yarn. I am having trouble
with pending flex up tasks, for which I cannot see any further
information. Thus, I do not even have the faintest idea where to start
debugging. I can easily run frameworks in my Mesos cluster, but Yarn NMs
are a different issue.

My idea was to use package installations of NM on the slave nodes, if
that is possible (?). The documentation mentions sth. about remote
distribution, but from the wording it seems to be more of an option than
a requirement. Packages have been installed and customized (myriad
configuration) via Puppet and I'd very much like to stay with this.

Anyhow, please let me know the possible causes of flex up tasks
remaining in the pending state. I can easily flex them down and thus
remove the pending tasks, but I never get them into active.

Any hint is very much appreciated.


Best regards,
Björn
-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hagemeier@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------