You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Harsh J <ha...@cloudera.com> on 2013/09/04 08:05:29 UTC

Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is
>> just sitting there in the right place, but if one wanted to make a
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I hate
>> to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is
>> the ID the applications may lookup in their container responses map we
>> discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with NodeManager
>>> via RPC?  Is there an interface to implement?  How are they opened
>>> and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map Reduce
>>>> to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) to
>>>> run a per-node service that exists during the lifetime of the
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>> entity to which it is addressed and may contain information that is
>> confidential, privileged and exempt from disclosure under applicable
>> law. If the reader of this message is not the intended recipient, you
>> are hereby notified that any printing, copying, dissemination,
>> distribution, disclosure or forwarding of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J