You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@twill.apache.org by Mike Walch <mw...@gmail.com> on 2015/04/30 21:28:41 UTC

Virtual cores in Twill/YARN

I am trying to start a Twill Runnable with 2 cores in YARN.  When I set the
number of virtual cores to 2 in my ResourceSpecification, the container
that is started in YARN ends up having only 1 core (according to its logs
and the NodeManager).  It looks like Twill is asking for 2 cores in the
ApplicationMaster but YARN only returns a container with 1 core.
Therefore, I am not sure if this is a Twill or YARN problem.  I am running
Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any of
the configuration for virtual cores from the default.  Any ideas of what
could be causing this?

Below are the logs from my Twill ApplicationMaster:

17:56:00.610 [ApplicationMasterService] INFO
o.a.t.i.a.ApplicationMasterService - Request 1 container with capability
<memory:1024, vCores:2> for runnable FluoWorker

17:56:02.616 [ApplicationMasterService] INFO
o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
}

I added the logging below to my Twill 0.5 branch which shows that the
container returned by YARN only has 1 core even though the request was for
2:

17:56:02.616 [ApplicationMasterService] INFO
o.a.t.i.a.ApplicationMasterService -
processLauncher.getContainerInfo().getVirtualCores() =  1

17:56:02.617 [ApplicationMasterService] INFO
o.a.t.i.a.ApplicationMasterService -
provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
= 2

Re: Virtual cores in Twill/YARN

Posted by Nitin Motgi <ni...@cask.co>.

###
Random auto-corrects and typos are my special gift to you. When I forward they are from others. 

> On May 1, 2015, at 7:35 AM, Mike Walch <mw...@gmail.com> wrote:
> 
> I am using the CapacityScheduler.  My Hadoop configuration files can be
> viewed at: https://github.com/fluo-io/fluo-dev/tree/master/conf/hadoop
> 
> While this could be a configuration issue, I don't think it's a lack of
> resources as my ResourceManager has several vcores remaining after starting
> my YARN app.
> 
>> On Thu, Apr 30, 2015 at 7:57 PM Poorna Chandra <po...@cask.co> wrote:
>> 
>> Hi Mike,
>> 
>> What YARN scheduler are you using?
>> 
>> Poorna.
>> 
>> 
>>> On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com> wrote:
>>> 
>>> I am trying to start a Twill Runnable with 2 cores in YARN.  When I set
>> the
>>> number of virtual cores to 2 in my ResourceSpecification, the container
>>> that is started in YARN ends up having only 1 core (according to its logs
>>> and the NodeManager).  It looks like Twill is asking for 2 cores in the
>>> ApplicationMaster but YARN only returns a container with 1 core.
>>> Therefore, I am not sure if this is a Twill or YARN problem.  I am
>> running
>>> Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any
>> of
>>> the configuration for virtual cores from the default.  Any ideas of what
>>> could be causing this?
>>> 
>>> Below are the logs from my Twill ApplicationMaster:
>>> 
>>> 17:56:00.610 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService - Request 1 container with capability
>>> <memory:1024, vCores:2> for runnable FluoWorker
>>> 
>>> 17:56:02.616 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
>> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
>>> }
>>> 
>>> I added the logging below to my Twill 0.5 branch which shows that the
>>> container returned by YARN only has 1 core even though the request was
>> for
>>> 2:
>>> 
>>> 17:56:02.616 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService -
>>> processLauncher.getContainerInfo().getVirtualCores() =  1
>>> 
>>> 17:56:02.617 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService -
>> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
>>> = 2
>> 

Re: Virtual cores in Twill/YARN

Posted by Mike Walch <mw...@gmail.com>.
I found out the problem.  CPU scheduling is not enabled by default for the
capacity scheduler.  The following needs to be set in
capacity-scheduler.xml to enable it:

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>

<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

If the above is set, containers can have multiple vcores.  However, cgroups
should also be enabled.  Otherwise, CPU scheduling will be best effort &
unpredictable.

Not sure if this already exists.  However, it would be nice to be able to
configure Twill to throw an exception or fail if an application does not
get all of its requested resources or if you try to specify the number of
cores or memory and the scheduler (that is configured) does not support
those resource types.


On Fri, May 1, 2015 at 12:09 PM Mike Walch <mw...@gmail.com> wrote:

> Hi Terence,
>
> I don't see anything out of ordinary in the RM log.  The RM logs that its
> returning a container with 1 core but there are no errors or warnings.
>
> After doing a little more research, this may be caused by not having
> cgroups enabled in YARN.  I am working on enabling them now and will let
> everyone know if that fixes the issue.
>
> -Mike
>
> On Fri, May 1, 2015 at 11:45 AM Terence Yim <ch...@gmail.com> wrote:
>
>> Hi Mike,
>>
>> Does the RM log shows any hint?
>>
>> Terence
>>
>> Sent from my iPhone
>>
>> > On May 1, 2015, at 7:35 AM, Mike Walch <mw...@gmail.com> wrote:
>> >
>> > I am using the CapacityScheduler.  My Hadoop configuration files can be
>> > viewed at: https://github.com/fluo-io/fluo-dev/tree/master/conf/hadoop
>> >
>> > While this could be a configuration issue, I don't think it's a lack of
>> > resources as my ResourceManager has several vcores remaining after
>> starting
>> > my YARN app.
>> >
>> >> On Thu, Apr 30, 2015 at 7:57 PM Poorna Chandra <po...@cask.co> wrote:
>> >>
>> >> Hi Mike,
>> >>
>> >> What YARN scheduler are you using?
>> >>
>> >> Poorna.
>> >>
>> >>
>> >>> On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com>
>> wrote:
>> >>>
>> >>> I am trying to start a Twill Runnable with 2 cores in YARN.  When I
>> set
>> >> the
>> >>> number of virtual cores to 2 in my ResourceSpecification, the
>> container
>> >>> that is started in YARN ends up having only 1 core (according to its
>> logs
>> >>> and the NodeManager).  It looks like Twill is asking for 2 cores in
>> the
>> >>> ApplicationMaster but YARN only returns a container with 1 core.
>> >>> Therefore, I am not sure if this is a Twill or YARN problem.  I am
>> >> running
>> >>> Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing
>> any
>> >> of
>> >>> the configuration for virtual cores from the default.  Any ideas of
>> what
>> >>> could be causing this?
>> >>>
>> >>> Below are the logs from my Twill ApplicationMaster:
>> >>>
>> >>> 17:56:00.610 [ApplicationMasterService] INFO
>> >>> o.a.t.i.a.ApplicationMasterService - Request 1 container with
>> capability
>> >>> <memory:1024, vCores:2> for runnable FluoWorker
>> >>>
>> >>> 17:56:02.616 [ApplicationMasterService] INFO
>> >>> o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
>> >>
>> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
>> >>> }
>> >>>
>> >>> I added the logging below to my Twill 0.5 branch which shows that the
>> >>> container returned by YARN only has 1 core even though the request was
>> >> for
>> >>> 2:
>> >>>
>> >>> 17:56:02.616 [ApplicationMasterService] INFO
>> >>> o.a.t.i.a.ApplicationMasterService -
>> >>> processLauncher.getContainerInfo().getVirtualCores() =  1
>> >>>
>> >>> 17:56:02.617 [ApplicationMasterService] INFO
>> >>> o.a.t.i.a.ApplicationMasterService -
>> >>
>> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
>> >>> = 2
>> >>
>>
>

Re: Virtual cores in Twill/YARN

Posted by Mike Walch <mw...@gmail.com>.
Hi Terence,

I don't see anything out of ordinary in the RM log.  The RM logs that its
returning a container with 1 core but there are no errors or warnings.

After doing a little more research, this may be caused by not having
cgroups enabled in YARN.  I am working on enabling them now and will let
everyone know if that fixes the issue.

-Mike

On Fri, May 1, 2015 at 11:45 AM Terence Yim <ch...@gmail.com> wrote:

> Hi Mike,
>
> Does the RM log shows any hint?
>
> Terence
>
> Sent from my iPhone
>
> > On May 1, 2015, at 7:35 AM, Mike Walch <mw...@gmail.com> wrote:
> >
> > I am using the CapacityScheduler.  My Hadoop configuration files can be
> > viewed at: https://github.com/fluo-io/fluo-dev/tree/master/conf/hadoop
> >
> > While this could be a configuration issue, I don't think it's a lack of
> > resources as my ResourceManager has several vcores remaining after
> starting
> > my YARN app.
> >
> >> On Thu, Apr 30, 2015 at 7:57 PM Poorna Chandra <po...@cask.co> wrote:
> >>
> >> Hi Mike,
> >>
> >> What YARN scheduler are you using?
> >>
> >> Poorna.
> >>
> >>
> >>> On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com> wrote:
> >>>
> >>> I am trying to start a Twill Runnable with 2 cores in YARN.  When I set
> >> the
> >>> number of virtual cores to 2 in my ResourceSpecification, the container
> >>> that is started in YARN ends up having only 1 core (according to its
> logs
> >>> and the NodeManager).  It looks like Twill is asking for 2 cores in the
> >>> ApplicationMaster but YARN only returns a container with 1 core.
> >>> Therefore, I am not sure if this is a Twill or YARN problem.  I am
> >> running
> >>> Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any
> >> of
> >>> the configuration for virtual cores from the default.  Any ideas of
> what
> >>> could be causing this?
> >>>
> >>> Below are the logs from my Twill ApplicationMaster:
> >>>
> >>> 17:56:00.610 [ApplicationMasterService] INFO
> >>> o.a.t.i.a.ApplicationMasterService - Request 1 container with
> capability
> >>> <memory:1024, vCores:2> for runnable FluoWorker
> >>>
> >>> 17:56:02.616 [ApplicationMasterService] INFO
> >>> o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
> >>
> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
> >>> }
> >>>
> >>> I added the logging below to my Twill 0.5 branch which shows that the
> >>> container returned by YARN only has 1 core even though the request was
> >> for
> >>> 2:
> >>>
> >>> 17:56:02.616 [ApplicationMasterService] INFO
> >>> o.a.t.i.a.ApplicationMasterService -
> >>> processLauncher.getContainerInfo().getVirtualCores() =  1
> >>>
> >>> 17:56:02.617 [ApplicationMasterService] INFO
> >>> o.a.t.i.a.ApplicationMasterService -
> >>
> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
> >>> = 2
> >>
>

Re: Virtual cores in Twill/YARN

Posted by Terence Yim <ch...@gmail.com>.
Hi Mike,

Does the RM log shows any hint?

Terence

Sent from my iPhone

> On May 1, 2015, at 7:35 AM, Mike Walch <mw...@gmail.com> wrote:
> 
> I am using the CapacityScheduler.  My Hadoop configuration files can be
> viewed at: https://github.com/fluo-io/fluo-dev/tree/master/conf/hadoop
> 
> While this could be a configuration issue, I don't think it's a lack of
> resources as my ResourceManager has several vcores remaining after starting
> my YARN app.
> 
>> On Thu, Apr 30, 2015 at 7:57 PM Poorna Chandra <po...@cask.co> wrote:
>> 
>> Hi Mike,
>> 
>> What YARN scheduler are you using?
>> 
>> Poorna.
>> 
>> 
>>> On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com> wrote:
>>> 
>>> I am trying to start a Twill Runnable with 2 cores in YARN.  When I set
>> the
>>> number of virtual cores to 2 in my ResourceSpecification, the container
>>> that is started in YARN ends up having only 1 core (according to its logs
>>> and the NodeManager).  It looks like Twill is asking for 2 cores in the
>>> ApplicationMaster but YARN only returns a container with 1 core.
>>> Therefore, I am not sure if this is a Twill or YARN problem.  I am
>> running
>>> Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any
>> of
>>> the configuration for virtual cores from the default.  Any ideas of what
>>> could be causing this?
>>> 
>>> Below are the logs from my Twill ApplicationMaster:
>>> 
>>> 17:56:00.610 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService - Request 1 container with capability
>>> <memory:1024, vCores:2> for runnable FluoWorker
>>> 
>>> 17:56:02.616 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
>> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
>>> }
>>> 
>>> I added the logging below to my Twill 0.5 branch which shows that the
>>> container returned by YARN only has 1 core even though the request was
>> for
>>> 2:
>>> 
>>> 17:56:02.616 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService -
>>> processLauncher.getContainerInfo().getVirtualCores() =  1
>>> 
>>> 17:56:02.617 [ApplicationMasterService] INFO
>>> o.a.t.i.a.ApplicationMasterService -
>> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
>>> = 2
>> 

Re: Virtual cores in Twill/YARN

Posted by Mike Walch <mw...@gmail.com>.
I am using the CapacityScheduler.  My Hadoop configuration files can be
viewed at: https://github.com/fluo-io/fluo-dev/tree/master/conf/hadoop

While this could be a configuration issue, I don't think it's a lack of
resources as my ResourceManager has several vcores remaining after starting
my YARN app.

On Thu, Apr 30, 2015 at 7:57 PM Poorna Chandra <po...@cask.co> wrote:

> Hi Mike,
>
> What YARN scheduler are you using?
>
> Poorna.
>
>
> On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com> wrote:
>
> > I am trying to start a Twill Runnable with 2 cores in YARN.  When I set
> the
> > number of virtual cores to 2 in my ResourceSpecification, the container
> > that is started in YARN ends up having only 1 core (according to its logs
> > and the NodeManager).  It looks like Twill is asking for 2 cores in the
> > ApplicationMaster but YARN only returns a container with 1 core.
> > Therefore, I am not sure if this is a Twill or YARN problem.  I am
> running
> > Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any
> of
> > the configuration for virtual cores from the default.  Any ideas of what
> > could be causing this?
> >
> > Below are the logs from my Twill ApplicationMaster:
> >
> > 17:56:00.610 [ApplicationMasterService] INFO
> > o.a.t.i.a.ApplicationMasterService - Request 1 container with capability
> > <memory:1024, vCores:2> for runnable FluoWorker
> >
> > 17:56:02.616 [ApplicationMasterService] INFO
> > o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
> >
> >
> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
> > }
> >
> > I added the logging below to my Twill 0.5 branch which shows that the
> > container returned by YARN only has 1 core even though the request was
> for
> > 2:
> >
> > 17:56:02.616 [ApplicationMasterService] INFO
> > o.a.t.i.a.ApplicationMasterService -
> > processLauncher.getContainerInfo().getVirtualCores() =  1
> >
> > 17:56:02.617 [ApplicationMasterService] INFO
> > o.a.t.i.a.ApplicationMasterService -
> >
> >
> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
> > = 2
> >
>

Re: Virtual cores in Twill/YARN

Posted by Poorna Chandra <po...@cask.co>.
Hi Mike,

What YARN scheduler are you using?

Poorna.


On Thu, Apr 30, 2015 at 12:28 PM, Mike Walch <mw...@gmail.com> wrote:

> I am trying to start a Twill Runnable with 2 cores in YARN.  When I set the
> number of virtual cores to 2 in my ResourceSpecification, the container
> that is started in YARN ends up having only 1 core (according to its logs
> and the NodeManager).  It looks like Twill is asking for 2 cores in the
> ApplicationMaster but YARN only returns a container with 1 core.
> Therefore, I am not sure if this is a Twill or YARN problem.  I am running
> Twill 0.5 and Hadoop 2.6.0.  In my yarn-site.xml, I am not changing any of
> the configuration for virtual cores from the default.  Any ideas of what
> could be causing this?
>
> Below are the logs from my Twill ApplicationMaster:
>
> 17:56:00.610 [ApplicationMasterService] INFO
> o.a.t.i.a.ApplicationMasterService - Request 1 container with capability
> <memory:1024, vCores:2> for runnable FluoWorker
>
> 17:56:02.616 [ApplicationMasterService] INFO
> o.a.t.i.a.ApplicationMasterService - Starting runnable FluoWorker with
>
> RunnableProcessLauncher{container=org.apache.twill.internal.yarn.Hadoop21YarnContainerInfo@74cff77f
> }
>
> I added the logging below to my Twill 0.5 branch which shows that the
> container returned by YARN only has 1 core even though the request was for
> 2:
>
> 17:56:02.616 [ApplicationMasterService] INFO
> o.a.t.i.a.ApplicationMasterService -
> processLauncher.getContainerInfo().getVirtualCores() =  1
>
> 17:56:02.617 [ApplicationMasterService] INFO
> o.a.t.i.a.ApplicationMasterService -
>
> provisionRequest.getRuntimeSpec().getResourceSpecification().getVirtualCores()
> = 2
>