You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by John Lilley <jo...@redpoint.net> on 2013/06/05 00:30:30 UTC

yarn-site.xml and aux-services

I notice the yarn-site.xml

  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
    <description>shuffle service that needs to be set for Map Reduce to run </description>
  </property>

Is this a general-purpose hook?
Can I tell yarn to run *my* per-node service?
Is there some other way (within the recommended Hadoop framework) to run a per-node service that exists during the lifetime of the NodeManager?

John Lilley
Chief Architect, RedPoint Global Inc.
1515 Walnut Street | Suite 200 | Boulder, CO 80302
T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
Skype: jlilley.redpoint | john.lilley@redpoint.net<ma...@redpoint.net> | www.redpoint.net<http://www.redpoint.net/>

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

https://issues.apache.org/jira/browse/YARN-1151
--john

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Thursday, September 05, 2013 12:14 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for 
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a 
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are 
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for 
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is 
>>> just sitting there in the right place, but if one wanted to make a 
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is 
>>> the ID the applications may lookup in their container responses map 
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like 
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the 
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with 
>>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use 
>>>> your own AuxService. It needs to be on the NodeManager CLASSPATH to 
>>>> run (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map 
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework) 
>>>>> to run a per-node service that exists during the lifetime of the 
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or 
>>> entity to which it is addressed and may contain information that is 
>>> confidential, privileged and exempt from disclosure under applicable 
>>> law. If the reader of this message is not the intended recipient, 
>>> you are hereby notified that any printing, copying, dissemination, 
>>> distribution, disclosure or forwarding of this communication is 
>>> strictly prohibited. If you have received this communication in 
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Please log a JIRA on https://issues.apache.org/jira/browse/YARN (do
let the thread know the ID as well, in spirit of http://xkcd.com/979/)
:)

On Thu, Sep 5, 2013 at 11:41 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.
>
> FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, September 04, 2013 12:05 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> We could class-load directly from HDFS, like HBase Co-Processors do.
>
>> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>
> Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.
>
> On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
>> Harsh,
>>
>> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>>
>> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>>
>> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
>> 1) AM spawns "mapper-like" tasks around the cluster
>> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
>> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
>> 4) AM spawns "reducer-like" tasks around the cluster.
>> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>>
>> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>>
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Friday, August 23, 2013 11:00 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>>
>> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>>
>> (I know the right next thing with such an ability people will ask for
>> is hot-code-upgrades...)
>>
>> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>>> Are there recommended conventions for adding additional code to a
>>> stock Hadoop install?
>>>
>>> It would be nice if we could piggyback on whatever mechanisms are
>>> used to distribute hadoop itself around the cluster.
>>>
>>> john
>>>
>>>
>>>
>>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>>> Sent: Thursday, August 22, 2013 6:25 PM
>>>
>>>
>>> To: user@hadoop.apache.org
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>>
>>>
>>>
>>>
>>> Auxiliary services are essentially administer-configured services.
>>> So, they have to be set up at install time - before NM is started.
>>>
>>>
>>>
>>> +Vinod
>>>
>>>
>>>
>>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>
>>> Following up on this, how exactly does one *install* the jar(s) for
>>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>>> MapReduce's aux-service is presumably installed with Hadoop and is
>>> just sitting there in the right place, but if one wanted to make a
>>> whole new aux-service that belonged with an AM, how would one do it?
>>>
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: John Lilley [mailto:john.lilley@redpoint.net]
>>> Sent: Wednesday, June 05, 2013 11:41 AM
>>> To: user@hadoop.apache.org
>>> Subject: RE: yarn-site.xml and aux-services
>>>
>>> Wow, thanks.  Is this documented anywhere other than the code?  I
>>> hate to waste y'alls time on things that can be RTFMed.
>>> John
>>>
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Wednesday, June 05, 2013 9:35 AM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> John,
>>>
>>> The format is ID and sub-config based:
>>>
>>> First, you define an ID as a service, like the string "foo". This is
>>> the ID the applications may lookup in their container responses map
>>> we discussed over another thread (around shuffle handler).
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo</value>
>>> </property>
>>>
>>> Then you define an actual implementation class for that ID "foo", like so:
>>>
>>> <property>
>>> <name>yarn.nodemanager.aux-services.foo.class</name>
>>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>>
>>> If you have multiple services foo and bar, then it would appear like
>>> the below (comma separated IDs and individual configs):
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>foo,bar</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>>> </property>
>>>
>>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> Good, I was hoping that would be the case.  But what are the
>>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>>> A scoped class name?  Or a key string into some map elsewhere?
>>>>
>>>> e.g. like:
>>>>
>>>> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>     <value>myauxserviceclassname</value>
>>>> </property>
>>>>
>>>> Concerning auxiliary services -- do they communicate with
>>>> NodeManager via RPC?  Is there an interface to implement?  How are
>>>> they opened and closed with NodeManager?
>>>>
>>>> Thanks
>>>> John
>>>>
>>>> -----Original Message-----
>>>> From: Harsh J [mailto:harsh@cloudera.com]
>>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>>> To: <us...@hadoop.apache.org>
>>>> Subject: Re: yarn-site.xml and aux-services
>>>>
>>>> Yes, thats what this is for. You can implement, pass in and use your
>>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>>> (and NM has to be restarted to apply).
>>>>
>>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>>> <jo...@redpoint.net>
>>>> wrote:
>>>>> I notice the yarn-site.xml
>>>>>
>>>>>
>>>>>
>>>>>   <property>
>>>>>
>>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>>
>>>>>     <value>mapreduce.shuffle</value>
>>>>>
>>>>>     <description>shuffle service that needs to be set for Map
>>>>> Reduce to run </description>
>>>>>
>>>>>   </property>
>>>>>
>>>>>
>>>>>
>>>>> Is this a general-purpose hook?
>>>>>
>>>>> Can I tell yarn to run *my* per-node service?
>>>>>
>>>>> Is there some other way (within the recommended Hadoop framework)
>>>>> to run a per-node service that exists during the lifetime of the
>>>>> NodeManager?
>>>>>
>>>>>
>>>>>
>>>>> John Lilley
>>>>>
>>>>> Chief Architect, RedPoint Global Inc.
>>>>>
>>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>>
>>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>>
>>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>>> www.redpoint.net
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Harsh J
>>>
>>>
>>>
>>> --
>>> Harsh J
>>>
>>>
>>>
>>>
>>> --
>>> +Vinod
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> NOTICE: This message is intended for the use of the individual or
>>> entity to which it is addressed and may contain information that is
>>> confidential, privileged and exempt from disclosure under applicable
>>> law. If the reader of this message is not the intended recipient, you
>>> are hereby notified that any printing, copying, dissemination,
>>> distribution, disclosure or forwarding of this communication is
>>> strictly prohibited. If you have received this communication in
>>> error, please contact the sender immediately and delete it from your system. Thank You.
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks as usual for your sage advice.  I was hoping to avoid actually installing anything on individual Hadoop nodes and finessing the service by spawning it from a task using LocalResources, but this is probably fraught with trouble.

FWIW, I would vote to be able to load YARN services from HDFS.  What is the appropriate forum to file a request like that?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, September 04, 2013 12:05 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the time, and/or implementing a way to spawn/end a dedicated service temporarily? I'd pick trying to implement such a thing than have my containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for 
> is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a 
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are 
>> used to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. 
>> So, they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for 
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is 
>> just sitting there in the right place, but if one wanted to make a 
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I 
>> hate to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is 
>> the ID the applications may lookup in their container responses map 
>> we discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like 
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the 
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value> </property> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with 
>>> NodeManager via RPC?  Is there an interface to implement?  How are 
>>> they opened and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your 
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map 
>>>> Reduce to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) 
>>>> to run a per-node service that exists during the lifetime of the 
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or 
>> entity to which it is addressed and may contain information that is 
>> confidential, privileged and exempt from disclosure under applicable 
>> law. If the reader of this message is not the intended recipient, you 
>> are hereby notified that any printing, copying, dissemination, 
>> distribution, disclosure or forwarding of this communication is 
>> strictly prohibited. If you have received this communication in 
>> error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is
>> just sitting there in the right place, but if one wanted to make a
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I hate
>> to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is
>> the ID the applications may lookup in their container responses map we
>> discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with NodeManager
>>> via RPC?  Is there an interface to implement?  How are they opened
>>> and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map Reduce
>>>> to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) to
>>>> run a per-node service that exists during the lifetime of the
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>> entity to which it is addressed and may contain information that is
>> confidential, privileged and exempt from disclosure under applicable
>> law. If the reader of this message is not the intended recipient, you
>> are hereby notified that any printing, copying, dissemination,
>> distribution, disclosure or forwarding of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is
>> just sitting there in the right place, but if one wanted to make a
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I hate
>> to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is
>> the ID the applications may lookup in their container responses map we
>> discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with NodeManager
>>> via RPC?  Is there an interface to implement?  How are they opened
>>> and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map Reduce
>>>> to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) to
>>>> run a per-node service that exists during the lifetime of the
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>> entity to which it is addressed and may contain information that is
>> confidential, privileged and exempt from disclosure under applicable
>> law. If the reader of this message is not the intended recipient, you
>> are hereby notified that any printing, copying, dissemination,
>> distribution, disclosure or forwarding of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is
>> just sitting there in the right place, but if one wanted to make a
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I hate
>> to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is
>> the ID the applications may lookup in their container responses map we
>> discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with NodeManager
>>> via RPC?  Is there an interface to implement?  How are they opened
>>> and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map Reduce
>>>> to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) to
>>>> run a per-node service that exists during the lifetime of the
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>> entity to which it is addressed and may contain information that is
>> confidential, privileged and exempt from disclosure under applicable
>> law. If the reader of this message is not the intended recipient, you
>> are hereby notified that any printing, copying, dissemination,
>> distribution, disclosure or forwarding of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

We could class-load directly from HDFS, like HBase Co-Processors do.

> Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:

Isn't this more complex than just running a dedicated service all the
time, and/or implementing a way to spawn/end a dedicated service
temporarily? I'd pick trying to implement such a thing than have my
containers implement more logic.

On Fri, Aug 23, 2013 at 11:17 PM, John Lilley <jo...@redpoint.net> wrote:
> Harsh,
>
> Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.
>
> What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?
>
> Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
> 1) AM spawns "mapper-like" tasks around the cluster
> 2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
> 3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
> 4) AM spawns "reducer-like" tasks around the cluster.
> 5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.
>
> There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.
>
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Friday, August 23, 2013 11:00 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.
>
> I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.
>
> (I know the right next thing with such an ability people will ask for is hot-code-upgrades...)
>
> On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
>> Are there recommended conventions for adding additional code to a
>> stock Hadoop install?
>>
>> It would be nice if we could piggyback on whatever mechanisms are used
>> to distribute hadoop itself around the cluster.
>>
>> john
>>
>>
>>
>> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
>> Sent: Thursday, August 22, 2013 6:25 PM
>>
>>
>> To: user@hadoop.apache.org
>> Subject: Re: yarn-site.xml and aux-services
>>
>>
>>
>>
>>
>> Auxiliary services are essentially administer-configured services. So,
>> they have to be set up at install time - before NM is started.
>>
>>
>>
>> +Vinod
>>
>>
>>
>> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley
>> <jo...@redpoint.net>
>> wrote:
>>
>> Following up on this, how exactly does one *install* the jar(s) for
>> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
>> MapReduce's aux-service is presumably installed with Hadoop and is
>> just sitting there in the right place, but if one wanted to make a
>> whole new aux-service that belonged with an AM, how would one do it?
>>
>> John
>>
>>
>> -----Original Message-----
>> From: John Lilley [mailto:john.lilley@redpoint.net]
>> Sent: Wednesday, June 05, 2013 11:41 AM
>> To: user@hadoop.apache.org
>> Subject: RE: yarn-site.xml and aux-services
>>
>> Wow, thanks.  Is this documented anywhere other than the code?  I hate
>> to waste y'alls time on things that can be RTFMed.
>> John
>>
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Wednesday, June 05, 2013 9:35 AM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> John,
>>
>> The format is ID and sub-config based:
>>
>> First, you define an ID as a service, like the string "foo". This is
>> the ID the applications may lookup in their container responses map we
>> discussed over another thread (around shuffle handler).
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo</value>
>> </property>
>>
>> Then you define an actual implementation class for that ID "foo", like so:
>>
>> <property>
>> <name>yarn.nodemanager.aux-services.foo.class</name>
>> <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>>
>> If you have multiple services foo and bar, then it would appear like
>> the below (comma separated IDs and individual configs):
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>foo,bar</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.foo.class</name>
>>     <value>com.mypack.MyAuxServiceClassForFoo</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services.bar.class</name>
>>     <value>com.mypack.MyAuxServiceClassForBar</value>
>> </property>
>>
>> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> Good, I was hoping that would be the case.  But what are the
>>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>>> A scoped class name?  Or a key string into some map elsewhere?
>>>
>>> e.g. like:
>>>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>mapreduce.shuffle</value>
>>> </property>
>>> <property>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>     <value>myauxserviceclassname</value>
>>> </property>
>>>
>>> Concerning auxiliary services -- do they communicate with NodeManager
>>> via RPC?  Is there an interface to implement?  How are they opened
>>> and closed with NodeManager?
>>>
>>> Thanks
>>> John
>>>
>>> -----Original Message-----
>>> From: Harsh J [mailto:harsh@cloudera.com]
>>> Sent: Tuesday, June 04, 2013 11:58 PM
>>> To: <us...@hadoop.apache.org>
>>> Subject: Re: yarn-site.xml and aux-services
>>>
>>> Yes, thats what this is for. You can implement, pass in and use your
>>> own AuxService. It needs to be on the NodeManager CLASSPATH to run
>>> (and NM has to be restarted to apply).
>>>
>>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley
>>> <jo...@redpoint.net>
>>> wrote:
>>>> I notice the yarn-site.xml
>>>>
>>>>
>>>>
>>>>   <property>
>>>>
>>>>     <name>yarn.nodemanager.aux-services</name>
>>>>
>>>>     <value>mapreduce.shuffle</value>
>>>>
>>>>     <description>shuffle service that needs to be set for Map Reduce
>>>> to run </description>
>>>>
>>>>   </property>
>>>>
>>>>
>>>>
>>>> Is this a general-purpose hook?
>>>>
>>>> Can I tell yarn to run *my* per-node service?
>>>>
>>>> Is there some other way (within the recommended Hadoop framework) to
>>>> run a per-node service that exists during the lifetime of the
>>>> NodeManager?
>>>>
>>>>
>>>>
>>>> John Lilley
>>>>
>>>> Chief Architect, RedPoint Global Inc.
>>>>
>>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>>
>>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>>
>>>> Skype: jlilley.redpoint | john.lilley@redpoint.net |
>>>> www.redpoint.net
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Harsh J
>>
>>
>>
>> --
>> Harsh J
>>
>>
>>
>>
>> --
>> +Vinod
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>> CONFIDENTIALITY NOTICE
>> NOTICE: This message is intended for the use of the individual or
>> entity to which it is addressed and may contain information that is
>> confidential, privileged and exempt from disclosure under applicable
>> law. If the reader of this message is not the intended recipient, you
>> are hereby notified that any printing, copying, dissemination,
>> distribution, disclosure or forwarding of this communication is
>> strictly prohibited. If you have received this communication in error,
>> please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?  

Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
1) AM spawns "mapper-like" tasks around the cluster
2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
4) AM spawns "reducer-like" tasks around the cluster.
5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.

There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.

John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, August 23, 2013 11:00 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

(I know the right next thing with such an ability people will ask for is hot-code-upgrades...)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a 
> stock Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used 
> to distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, 
> they have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
> <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for 
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is 
> just sitting there in the right place, but if one wanted to make a 
> whole new aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate 
> to waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is 
> the ID the applications may lookup in their container responses map we 
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like 
> the below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the 
>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager 
>> via RPC?  Is there an interface to implement?  How are they opened 
>> and closed with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your 
>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>> (and NM has to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce 
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to 
>>> run a per-node service that exists during the lifetime of the 
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>> www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?  

Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
1) AM spawns "mapper-like" tasks around the cluster
2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
4) AM spawns "reducer-like" tasks around the cluster.
5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.

There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.

John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, August 23, 2013 11:00 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

(I know the right next thing with such an ability people will ask for is hot-code-upgrades...)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a 
> stock Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used 
> to distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, 
> they have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
> <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for 
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is 
> just sitting there in the right place, but if one wanted to make a 
> whole new aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate 
> to waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is 
> the ID the applications may lookup in their container responses map we 
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like 
> the below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the 
>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager 
>> via RPC?  Is there an interface to implement?  How are they opened 
>> and closed with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your 
>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>> (and NM has to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce 
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to 
>>> run a per-node service that exists during the lifetime of the 
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>> www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?  

Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
1) AM spawns "mapper-like" tasks around the cluster
2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
4) AM spawns "reducer-like" tasks around the cluster.
5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.

There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.

John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, August 23, 2013 11:00 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

(I know the right next thing with such an ability people will ask for is hot-code-upgrades...)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a 
> stock Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used 
> to distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, 
> they have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
> <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for 
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is 
> just sitting there in the right place, but if one wanted to make a 
> whole new aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate 
> to waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is 
> the ID the applications may lookup in their container responses map we 
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like 
> the below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the 
>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager 
>> via RPC?  Is there an interface to implement?  How are they opened 
>> and closed with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your 
>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>> (and NM has to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce 
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to 
>>> run a per-node service that exists during the lifetime of the 
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>> www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Harsh,

Thanks for the clarification.  I would find it very convenient in this case to have my custom jars available in HDFS, but I can see the added complexity needed for YARN to maintain cache those to local disk.

What about having the tasks themselves start the per-node service as a child process?   I've been told that the NM kills the process group, but won't setgrp() circumvent that?  

Even given that, would the child process of one task have proper environment and permission to act on behalf of other tasks?  Consider a scenario analogous to the MR shuffle, where the persistent service serves up mapper output files to the reducers across the network:
1) AM spawns "mapper-like" tasks around the cluster
2) Each mapper-like task on a given node launches a "persistent service" child, but only if one is not already running.
3) Each mapper-like task writes one or more output files, and informs the service of those files (along with AM-id, Task-id etc).
4) AM spawns "reducer-like" tasks around the cluster.
5) Each reducer-like task is told which nodes contain "mapper" result data, and connects to services on those nodes to read the data.

There are some details missing, like how the lifetime of the temporary files is controlled to extend beyond the mapper-like task lifetime but still be cleaned up on AM exit, and how the reducer-like tasks are informed of which nodes have data.

John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Friday, August 23, 2013 11:00 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

The general practice is to install your deps into a custom location such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars, while also configuring the classes under the aux-services list. You need to take care of deploying jar versions to /opt/john-jars/ contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any other DFS, and the yarn-site.xml indicating the location plus class to load. Similar to HBase co-processors. But I'll defer to Vinod on if this would be a good thing to do.

(I know the right next thing with such an ability people will ask for is hot-code-upgrades...)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a 
> stock Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used 
> to distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, 
> they have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley 
> <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for 
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is 
> just sitting there in the right place, but if one wanted to make a 
> whole new aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate 
> to waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is 
> the ID the applications may lookup in their container responses map we 
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like 
> the below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the 
>> mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager 
>> via RPC?  Is there an interface to implement?  How are they opened 
>> and closed with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your 
>> own AuxService. It needs to be on the NodeManager CLASSPATH to run 
>> (and NM has to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley 
>> <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce 
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to 
>>> run a per-node service that exists during the lifetime of the 
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | 
>>> www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or 
> entity to which it is addressed and may contain information that is 
> confidential, privileged and exempt from disclosure under applicable 
> law. If the reader of this message is not the intended recipient, you 
> are hereby notified that any printing, copying, dissemination, 
> distribution, disclosure or forwarding of this communication is 
> strictly prohibited. If you have received this communication in error, 
> please contact the sender immediately and delete it from your system. Thank You.



--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

The general practice is to install your deps into a custom location
such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars,
while also configuring the classes under the aux-services list. You
need to take care of deploying jar versions to /opt/john-jars/
contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any
other DFS, and the yarn-site.xml indicating the location plus class to
load. Similar to HBase co-processors. But I'll defer to Vinod on if
this would be a good thing to do.

(I know the right next thing with such an ability people will ask for
is hot-code-upgrades…)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a stock
> Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used to
> distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, they
> have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the ID
> the applications may lookup in their container responses map we discussed
> over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the mechanics of
>> it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager via
>> RPC?  Is there an interface to implement?  How are they opened and closed
>> with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your own
>> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
>> to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to
>>> run a per-node service that exists during the lifetime of the
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

The general practice is to install your deps into a custom location
such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars,
while also configuring the classes under the aux-services list. You
need to take care of deploying jar versions to /opt/john-jars/
contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any
other DFS, and the yarn-site.xml indicating the location plus class to
load. Similar to HBase co-processors. But I'll defer to Vinod on if
this would be a good thing to do.

(I know the right next thing with such an ability people will ask for
is hot-code-upgrades…)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a stock
> Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used to
> distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, they
> have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the ID
> the applications may lookup in their container responses map we discussed
> over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the mechanics of
>> it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager via
>> RPC?  Is there an interface to implement?  How are they opened and closed
>> with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your own
>> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
>> to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to
>>> run a per-node service that exists during the lifetime of the
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

The general practice is to install your deps into a custom location
such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars,
while also configuring the classes under the aux-services list. You
need to take care of deploying jar versions to /opt/john-jars/
contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any
other DFS, and the yarn-site.xml indicating the location plus class to
load. Similar to HBase co-processors. But I'll defer to Vinod on if
this would be a good thing to do.

(I know the right next thing with such an ability people will ask for
is hot-code-upgrades…)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a stock
> Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used to
> distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, they
> have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the ID
> the applications may lookup in their container responses map we discussed
> over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the mechanics of
>> it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager via
>> RPC?  Is there an interface to implement?  How are they opened and closed
>> with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your own
>> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
>> to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to
>>> run a per-node service that exists during the lifetime of the
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

The general practice is to install your deps into a custom location
such as /opt/john-jars, and extend YARN_CLASSPATH to include the jars,
while also configuring the classes under the aux-services list. You
need to take care of deploying jar versions to /opt/john-jars/
contents across the cluster though.

I think it may be a neat idea to have jars be placed on HDFS or any
other DFS, and the yarn-site.xml indicating the location plus class to
load. Similar to HBase co-processors. But I'll defer to Vinod on if
this would be a good thing to do.

(I know the right next thing with such an ability people will ask for
is hot-code-upgrades…)

On Fri, Aug 23, 2013 at 10:11 PM, John Lilley <jo...@redpoint.net> wrote:
> Are there recommended conventions for adding additional code to a stock
> Hadoop install?
>
> It would be nice if we could piggyback on whatever mechanisms are used to
> distribute hadoop itself around the cluster.
>
> john
>
>
>
> From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
> Sent: Thursday, August 22, 2013 6:25 PM
>
>
> To: user@hadoop.apache.org
> Subject: Re: yarn-site.xml and aux-services
>
>
>
>
>
> Auxiliary services are essentially administer-configured services. So, they
> have to be set up at install time - before NM is started.
>
>
>
> +Vinod
>
>
>
> On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>
> wrote:
>
> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the ID
> the applications may lookup in their container responses map we discussed
> over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
>> Good, I was hoping that would be the case.  But what are the mechanics of
>> it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?
>> A scoped class name?  Or a key string into some map elsewhere?
>>
>> e.g. like:
>>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>mapreduce.shuffle</value>
>> </property>
>> <property>
>>     <name>yarn.nodemanager.aux-services</name>
>>     <value>myauxserviceclassname</value>
>> </property>
>>
>> Concerning auxiliary services -- do they communicate with NodeManager via
>> RPC?  Is there an interface to implement?  How are they opened and closed
>> with NodeManager?
>>
>> Thanks
>> John
>>
>> -----Original Message-----
>> From: Harsh J [mailto:harsh@cloudera.com]
>> Sent: Tuesday, June 04, 2013 11:58 PM
>> To: <us...@hadoop.apache.org>
>> Subject: Re: yarn-site.xml and aux-services
>>
>> Yes, thats what this is for. You can implement, pass in and use your own
>> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
>> to be restarted to apply).
>>
>> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
>> wrote:
>>> I notice the yarn-site.xml
>>>
>>>
>>>
>>>   <property>
>>>
>>>     <name>yarn.nodemanager.aux-services</name>
>>>
>>>     <value>mapreduce.shuffle</value>
>>>
>>>     <description>shuffle service that needs to be set for Map Reduce
>>> to run </description>
>>>
>>>   </property>
>>>
>>>
>>>
>>> Is this a general-purpose hook?
>>>
>>> Can I tell yarn to run *my* per-node service?
>>>
>>> Is there some other way (within the recommended Hadoop framework) to
>>> run a per-node service that exists during the lifetime of the
>>> NodeManager?
>>>
>>>
>>>
>>> John Lilley
>>>
>>> Chief Architect, RedPoint Global Inc.
>>>
>>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>>
>>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>>
>>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>>
>>>
>>
>>
>>
>> --
>> Harsh J
>
>
>
> --
> Harsh J
>
>
>
>
> --
> +Vinod
> Hortonworks Inc.
> http://hortonworks.com/
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader of
> this message is not the intended recipient, you are hereby notified that any
> printing, copying, dissemination, distribution, disclosure or forwarding of
> this communication is strictly prohibited. If you have received this
> communication in error, please contact the sender immediately and delete it
> from your system. Thank You.



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Are there recommended conventions for adding additional code to a stock Hadoop install?
It would be nice if we could piggyback on whatever mechanisms are used to distribute hadoop itself around the cluster.
john

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Thursday, August 22, 2013 6:25 PM
To: user@hadoop.apache.org
Subject: Re: yarn-site.xml and aux-services

Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started.

+Vinod

On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>> wrote:
Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net<ma...@redpoint.net>]
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516<tel:%2B1%20303%20541%201516>  | M: +1 720 938 5761<tel:%2B1%20720%20938%205761> | F: +1 781-705-2077<tel:%2B1%20781-705-2077>
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net<ma...@redpoint.net> | www.redpoint.net<http://www.redpoint.net>
>>
>>
>
>
>
> --
> Harsh J

--
Harsh J

--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Are there recommended conventions for adding additional code to a stock Hadoop install?
It would be nice if we could piggyback on whatever mechanisms are used to distribute hadoop itself around the cluster.
john

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Thursday, August 22, 2013 6:25 PM
To: user@hadoop.apache.org
Subject: Re: yarn-site.xml and aux-services

Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started.

+Vinod

On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>> wrote:
Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net<ma...@redpoint.net>]
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516<tel:%2B1%20303%20541%201516>  | M: +1 720 938 5761<tel:%2B1%20720%20938%205761> | F: +1 781-705-2077<tel:%2B1%20781-705-2077>
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net<ma...@redpoint.net> | www.redpoint.net<http://www.redpoint.net>
>>
>>
>
>
>
> --
> Harsh J

--
Harsh J

--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Are there recommended conventions for adding additional code to a stock Hadoop install?
It would be nice if we could piggyback on whatever mechanisms are used to distribute hadoop itself around the cluster.
john

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Thursday, August 22, 2013 6:25 PM
To: user@hadoop.apache.org
Subject: Re: yarn-site.xml and aux-services

Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started.

+Vinod

On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>> wrote:
Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net<ma...@redpoint.net>]
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516<tel:%2B1%20303%20541%201516>  | M: +1 720 938 5761<tel:%2B1%20720%20938%205761> | F: +1 781-705-2077<tel:%2B1%20781-705-2077>
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net<ma...@redpoint.net> | www.redpoint.net<http://www.redpoint.net>
>>
>>
>
>
>
> --
> Harsh J

--
Harsh J

--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Are there recommended conventions for adding additional code to a stock Hadoop install?
It would be nice if we could piggyback on whatever mechanisms are used to distribute hadoop itself around the cluster.
john

From: Vinod Kumar Vavilapalli [mailto:vinodkv@hortonworks.com]
Sent: Thursday, August 22, 2013 6:25 PM
To: user@hadoop.apache.org
Subject: Re: yarn-site.xml and aux-services

Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started.

+Vinod

On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>> wrote:
Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net<ma...@redpoint.net>]
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com<ma...@cloudera.com>]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516<tel:%2B1%20303%20541%201516>  | M: +1 720 938 5761<tel:%2B1%20720%20938%205761> | F: +1 781-705-2077<tel:%2B1%20781-705-2077>
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net<ma...@redpoint.net> | www.redpoint.net<http://www.redpoint.net>
>>
>>
>
>
>
> --
> Harsh J

--
Harsh J

--
+Vinod
Hortonworks Inc.
http://hortonworks.com/

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.

Re: yarn-site.xml and aux-services

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Auxiliary services are essentially administer-configured services. So, they
have to be set up at install time - before NM is started.

+Vinod


On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>wrote:

> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the
> ID the applications may lookup in their container responses map we
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
> > Good, I was hoping that would be the case.  But what are the mechanics
> of it?  Do I just add another entry?  And what exactly is
> "madreduce.shuffle"?  A scoped class name?  Or a key string into some map
> elsewhere?
> >
> > e.g. like:
> >
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>mapreduce.shuffle</value>
> > </property>
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>myauxserviceclassname</value>
> > </property>
> >
> > Concerning auxiliary services -- do they communicate with NodeManager
> via RPC?  Is there an interface to implement?  How are they opened and
> closed with NodeManager?
> >
> > Thanks
> > John
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Tuesday, June 04, 2013 11:58 PM
> > To: <us...@hadoop.apache.org>
> > Subject: Re: yarn-site.xml and aux-services
> >
> > Yes, thats what this is for. You can implement, pass in and use your own
> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
> to be restarted to apply).
> >
> > On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
> wrote:
> >> I notice the yarn-site.xml
> >>
> >>
> >>
> >>   <property>
> >>
> >>     <name>yarn.nodemanager.aux-services</name>
> >>
> >>     <value>mapreduce.shuffle</value>
> >>
> >>     <description>shuffle service that needs to be set for Map Reduce
> >> to run </description>
> >>
> >>   </property>
> >>
> >>
> >>
> >> Is this a general-purpose hook?
> >>
> >> Can I tell yarn to run *my* per-node service?
> >>
> >> Is there some other way (within the recommended Hadoop framework) to
> >> run a per-node service that exists during the lifetime of the
> NodeManager?
> >>
> >>
> >>
> >> John Lilley
> >>
> >> Chief Architect, RedPoint Global Inc.
> >>
> >> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
> >>
> >> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> >>
> >> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>



-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: yarn-site.xml and aux-services

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Auxiliary services are essentially administer-configured services. So, they
have to be set up at install time - before NM is started.

+Vinod


On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>wrote:

> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the
> ID the applications may lookup in their container responses map we
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
> > Good, I was hoping that would be the case.  But what are the mechanics
> of it?  Do I just add another entry?  And what exactly is
> "madreduce.shuffle"?  A scoped class name?  Or a key string into some map
> elsewhere?
> >
> > e.g. like:
> >
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>mapreduce.shuffle</value>
> > </property>
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>myauxserviceclassname</value>
> > </property>
> >
> > Concerning auxiliary services -- do they communicate with NodeManager
> via RPC?  Is there an interface to implement?  How are they opened and
> closed with NodeManager?
> >
> > Thanks
> > John
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Tuesday, June 04, 2013 11:58 PM
> > To: <us...@hadoop.apache.org>
> > Subject: Re: yarn-site.xml and aux-services
> >
> > Yes, thats what this is for. You can implement, pass in and use your own
> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
> to be restarted to apply).
> >
> > On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
> wrote:
> >> I notice the yarn-site.xml
> >>
> >>
> >>
> >>   <property>
> >>
> >>     <name>yarn.nodemanager.aux-services</name>
> >>
> >>     <value>mapreduce.shuffle</value>
> >>
> >>     <description>shuffle service that needs to be set for Map Reduce
> >> to run </description>
> >>
> >>   </property>
> >>
> >>
> >>
> >> Is this a general-purpose hook?
> >>
> >> Can I tell yarn to run *my* per-node service?
> >>
> >> Is there some other way (within the recommended Hadoop framework) to
> >> run a per-node service that exists during the lifetime of the
> NodeManager?
> >>
> >>
> >>
> >> John Lilley
> >>
> >> Chief Architect, RedPoint Global Inc.
> >>
> >> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
> >>
> >> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> >>
> >> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>



-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: yarn-site.xml and aux-services

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Auxiliary services are essentially administer-configured services. So, they
have to be set up at install time - before NM is started.

+Vinod


On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>wrote:

> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the
> ID the applications may lookup in their container responses map we
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
> > Good, I was hoping that would be the case.  But what are the mechanics
> of it?  Do I just add another entry?  And what exactly is
> "madreduce.shuffle"?  A scoped class name?  Or a key string into some map
> elsewhere?
> >
> > e.g. like:
> >
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>mapreduce.shuffle</value>
> > </property>
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>myauxserviceclassname</value>
> > </property>
> >
> > Concerning auxiliary services -- do they communicate with NodeManager
> via RPC?  Is there an interface to implement?  How are they opened and
> closed with NodeManager?
> >
> > Thanks
> > John
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Tuesday, June 04, 2013 11:58 PM
> > To: <us...@hadoop.apache.org>
> > Subject: Re: yarn-site.xml and aux-services
> >
> > Yes, thats what this is for. You can implement, pass in and use your own
> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
> to be restarted to apply).
> >
> > On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
> wrote:
> >> I notice the yarn-site.xml
> >>
> >>
> >>
> >>   <property>
> >>
> >>     <name>yarn.nodemanager.aux-services</name>
> >>
> >>     <value>mapreduce.shuffle</value>
> >>
> >>     <description>shuffle service that needs to be set for Map Reduce
> >> to run </description>
> >>
> >>   </property>
> >>
> >>
> >>
> >> Is this a general-purpose hook?
> >>
> >> Can I tell yarn to run *my* per-node service?
> >>
> >> Is there some other way (within the recommended Hadoop framework) to
> >> run a per-node service that exists during the lifetime of the
> NodeManager?
> >>
> >>
> >>
> >> John Lilley
> >>
> >> Chief Architect, RedPoint Global Inc.
> >>
> >> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
> >>
> >> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> >>
> >> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>



-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: yarn-site.xml and aux-services

Posted by Vinod Kumar Vavilapalli <vi...@hortonworks.com>.

Auxiliary services are essentially administer-configured services. So, they
have to be set up at install time - before NM is started.

+Vinod


On Thu, Aug 22, 2013 at 1:38 PM, John Lilley <jo...@redpoint.net>wrote:

> Following up on this, how exactly does one *install* the jar(s) for
> auxiliary service?  Can it be shipped out with the LocalResources of an AM?
> MapReduce's aux-service is presumably installed with Hadoop and is just
> sitting there in the right place, but if one wanted to make a whole new
> aux-service that belonged with an AM, how would one do it?
>
> John
>
> -----Original Message-----
> From: John Lilley [mailto:john.lilley@redpoint.net]
> Sent: Wednesday, June 05, 2013 11:41 AM
> To: user@hadoop.apache.org
> Subject: RE: yarn-site.xml and aux-services
>
> Wow, thanks.  Is this documented anywhere other than the code?  I hate to
> waste y'alls time on things that can be RTFMed.
> John
>
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Wednesday, June 05, 2013 9:35 AM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> John,
>
> The format is ID and sub-config based:
>
> First, you define an ID as a service, like the string "foo". This is the
> ID the applications may lookup in their container responses map we
> discussed over another thread (around shuffle handler).
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo</value>
> </property>
>
> Then you define an actual implementation class for that ID "foo", like so:
>
> <property>
> <name>yarn.nodemanager.aux-services.foo.class</name>
> <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
>
> If you have multiple services foo and bar, then it would appear like the
> below (comma separated IDs and individual configs):
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>foo,bar</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.foo.class</name>
>     <value>com.mypack.MyAuxServiceClassForFoo</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services.bar.class</name>
>     <value>com.mypack.MyAuxServiceClassForBar</value>
> </property>
>
> On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net>
> wrote:
> > Good, I was hoping that would be the case.  But what are the mechanics
> of it?  Do I just add another entry?  And what exactly is
> "madreduce.shuffle"?  A scoped class name?  Or a key string into some map
> elsewhere?
> >
> > e.g. like:
> >
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>mapreduce.shuffle</value>
> > </property>
> > <property>
> >     <name>yarn.nodemanager.aux-services</name>
> >     <value>myauxserviceclassname</value>
> > </property>
> >
> > Concerning auxiliary services -- do they communicate with NodeManager
> via RPC?  Is there an interface to implement?  How are they opened and
> closed with NodeManager?
> >
> > Thanks
> > John
> >
> > -----Original Message-----
> > From: Harsh J [mailto:harsh@cloudera.com]
> > Sent: Tuesday, June 04, 2013 11:58 PM
> > To: <us...@hadoop.apache.org>
> > Subject: Re: yarn-site.xml and aux-services
> >
> > Yes, thats what this is for. You can implement, pass in and use your own
> AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has
> to be restarted to apply).
> >
> > On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>
> wrote:
> >> I notice the yarn-site.xml
> >>
> >>
> >>
> >>   <property>
> >>
> >>     <name>yarn.nodemanager.aux-services</name>
> >>
> >>     <value>mapreduce.shuffle</value>
> >>
> >>     <description>shuffle service that needs to be set for Map Reduce
> >> to run </description>
> >>
> >>   </property>
> >>
> >>
> >>
> >> Is this a general-purpose hook?
> >>
> >> Can I tell yarn to run *my* per-node service?
> >>
> >> Is there some other way (within the recommended Hadoop framework) to
> >> run a per-node service that exists during the lifetime of the
> NodeManager?
> >>
> >>
> >>
> >> John Lilley
> >>
> >> Chief Architect, RedPoint Global Inc.
> >>
> >> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
> >>
> >> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
> >>
> >> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
> >>
> >>
> >
> >
> >
> > --
> > Harsh J
>
>
>
> --
> Harsh J
>



-- 
+Vinod
Hortonworks Inc.
http://hortonworks.com/

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net] 
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net] 
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net] 
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Following up on this, how exactly does one *install* the jar(s) for auxiliary service?  Can it be shipped out with the LocalResources of an AM?
MapReduce's aux-service is presumably installed with Hadoop and is just sitting there in the right place, but if one wanted to make a whole new aux-service that belonged with an AM, how would one do it?

John

-----Original Message-----
From: John Lilley [mailto:john.lilley@redpoint.net] 
Sent: Wednesday, June 05, 2013 11:41 AM
To: user@hadoop.apache.org
Subject: RE: yarn-site.xml and aux-services

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com]
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Wow, thanks.  Is this documented anywhere other than the code?  I hate to waste y'alls time on things that can be RTFMed.
John


-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Wednesday, June 05, 2013 9:35 AM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is the ID the applications may lookup in their container responses map we discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce 
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to 
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is
the ID the applications may lookup in their container responses map we
discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like
the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is
the ID the applications may lookup in their container responses map we
discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like
the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is
the ID the applications may lookup in their container responses map we
discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like
the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

John,

The format is ID and sub-config based:

First, you define an ID as a service, like the string "foo". This is
the ID the applications may lookup in their container responses map we
discussed over another thread (around shuffle handler).

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo</value>
</property>

Then you define an actual implementation class for that ID "foo", like so:

<property>
<name>yarn.nodemanager.aux-services.foo.class</name>
<value>com.mypack.MyAuxServiceClassForFoo</value>
</property>

If you have multiple services foo and bar, then it would appear like
the below (comma separated IDs and individual configs):

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>foo,bar</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.foo.class</name>
    <value>com.mypack.MyAuxServiceClassForFoo</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services.bar.class</name>
    <value>com.mypack.MyAuxServiceClassForBar</value>
</property>

On Wed, Jun 5, 2013 at 8:42 PM, John Lilley <jo...@redpoint.net> wrote:
> Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?
>
> e.g. like:
>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>mapreduce.shuffle</value>
> </property>
> <property>
>     <name>yarn.nodemanager.aux-services</name>
>     <value>myauxserviceclassname</value>
> </property>
>
> Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?
>
> Thanks
> John
>
> -----Original Message-----
> From: Harsh J [mailto:harsh@cloudera.com]
> Sent: Tuesday, June 04, 2013 11:58 PM
> To: <us...@hadoop.apache.org>
> Subject: Re: yarn-site.xml and aux-services
>
> Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).
>
> On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
>> I notice the yarn-site.xml
>>
>>
>>
>>   <property>
>>
>>     <name>yarn.nodemanager.aux-services</name>
>>
>>     <value>mapreduce.shuffle</value>
>>
>>     <description>shuffle service that needs to be set for Map Reduce
>> to run </description>
>>
>>   </property>
>>
>>
>>
>> Is this a general-purpose hook?
>>
>> Can I tell yarn to run *my* per-node service?
>>
>> Is there some other way (within the recommended Hadoop framework) to
>> run a per-node service that exists during the lifetime of the NodeManager?
>>
>>
>>
>> John Lilley
>>
>> Chief Architect, RedPoint Global Inc.
>>
>> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>>
>> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>>
>> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>>
>>
>
>
>
> --
> Harsh J



-- 
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?

e.g. like:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>myauxserviceclassname</value>
</property>

Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 04, 2013 11:58 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce 
> to run </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to 
> run a per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>

--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?

e.g. like:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>myauxserviceclassname</value>
</property>

Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 04, 2013 11:58 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce 
> to run </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to 
> run a per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>

--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?

e.g. like:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>myauxserviceclassname</value>
</property>

Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 04, 2013 11:58 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce 
> to run </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to 
> run a per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>

--
Harsh J

RE: yarn-site.xml and aux-services

Posted by John Lilley <jo...@redpoint.net>.

Good, I was hoping that would be the case.  But what are the mechanics of it?  Do I just add another entry?  And what exactly is "madreduce.shuffle"?  A scoped class name?  Or a key string into some map elsewhere?

e.g. like:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce.shuffle</value>
</property>
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>myauxserviceclassname</value>
</property>

Concerning auxiliary services -- do they communicate with NodeManager via RPC?  Is there an interface to implement?  How are they opened and closed with NodeManager?

Thanks
John

-----Original Message-----
From: Harsh J [mailto:harsh@cloudera.com] 
Sent: Tuesday, June 04, 2013 11:58 PM
To: <us...@hadoop.apache.org>
Subject: Re: yarn-site.xml and aux-services

Yes, thats what this is for. You can implement, pass in and use your own AuxService. It needs to be on the NodeManager CLASSPATH to run (and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce 
> to run </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to 
> run a per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>

--
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Yes, thats what this is for. You can implement, pass in and use your
own AuxService. It needs to be on the NodeManager CLASSPATH to run
(and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce to run
> </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Going by what I have read ,I think its a general purpose hook of Yarn arch.
to run any service in node managers. Hadoop uses this for shuffle service .
Other yarn based applications might use this as well.

Thanks,
Rahul


On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>wrote:

>  I notice the yarn-site.xml****
>
> ** **
>
>   <property>****
>
>     <name>yarn.nodemanager.aux-services</name>****
>
>     <value>mapreduce.shuffle</value>****
>
>     <description>shuffle service that needs to be set for Map Reduce to
> run </description>****
>
>   </property>****
>
> ** **
>
> Is this a general-purpose hook?  ****
>
> Can I tell yarn to run **my** per-node service?  ****
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?****
>
> ** **
>
> *John Lilley*
>
> Chief Architect, RedPoint Global Inc.****
>
> 1515 Walnut Street *|* Suite 200 *|* Boulder, CO 80302****
>
> T: +1 303 541 1516  *| *M: +1 720 938 5761 *|* F: +1 781-705-2077****
>
> Skype: jlilley.redpoint *|* *john.lilley@redpoint.net* *|*
> www.redpoint.net****
>
> ** **
>

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Yes, thats what this is for. You can implement, pass in and use your
own AuxService. It needs to be on the NodeManager CLASSPATH to run
(and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce to run
> </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Going by what I have read ,I think its a general purpose hook of Yarn arch.
to run any service in node managers. Hadoop uses this for shuffle service .
Other yarn based applications might use this as well.

Thanks,
Rahul


On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>wrote:

>  I notice the yarn-site.xml****
>
> ** **
>
>   <property>****
>
>     <name>yarn.nodemanager.aux-services</name>****
>
>     <value>mapreduce.shuffle</value>****
>
>     <description>shuffle service that needs to be set for Map Reduce to
> run </description>****
>
>   </property>****
>
> ** **
>
> Is this a general-purpose hook?  ****
>
> Can I tell yarn to run **my** per-node service?  ****
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?****
>
> ** **
>
> *John Lilley*
>
> Chief Architect, RedPoint Global Inc.****
>
> 1515 Walnut Street *|* Suite 200 *|* Boulder, CO 80302****
>
> T: +1 303 541 1516  *| *M: +1 720 938 5761 *|* F: +1 781-705-2077****
>
> Skype: jlilley.redpoint *|* *john.lilley@redpoint.net* *|*
> www.redpoint.net****
>
> ** **
>

Re: yarn-site.xml and aux-services

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Going by what I have read ,I think its a general purpose hook of Yarn arch.
to run any service in node managers. Hadoop uses this for shuffle service .
Other yarn based applications might use this as well.

Thanks,
Rahul


On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>wrote:

>  I notice the yarn-site.xml****
>
> ** **
>
>   <property>****
>
>     <name>yarn.nodemanager.aux-services</name>****
>
>     <value>mapreduce.shuffle</value>****
>
>     <description>shuffle service that needs to be set for Map Reduce to
> run </description>****
>
>   </property>****
>
> ** **
>
> Is this a general-purpose hook?  ****
>
> Can I tell yarn to run **my** per-node service?  ****
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?****
>
> ** **
>
> *John Lilley*
>
> Chief Architect, RedPoint Global Inc.****
>
> 1515 Walnut Street *|* Suite 200 *|* Boulder, CO 80302****
>
> T: +1 303 541 1516  *| *M: +1 720 938 5761 *|* F: +1 781-705-2077****
>
> Skype: jlilley.redpoint *|* *john.lilley@redpoint.net* *|*
> www.redpoint.net****
>
> ** **
>

Re: yarn-site.xml and aux-services

Posted by Rahul Bhattacharjee <ra...@gmail.com>.

Going by what I have read ,I think its a general purpose hook of Yarn arch.
to run any service in node managers. Hadoop uses this for shuffle service .
Other yarn based applications might use this as well.

Thanks,
Rahul


On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net>wrote:

>  I notice the yarn-site.xml****
>
> ** **
>
>   <property>****
>
>     <name>yarn.nodemanager.aux-services</name>****
>
>     <value>mapreduce.shuffle</value>****
>
>     <description>shuffle service that needs to be set for Map Reduce to
> run </description>****
>
>   </property>****
>
> ** **
>
> Is this a general-purpose hook?  ****
>
> Can I tell yarn to run **my** per-node service?  ****
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?****
>
> ** **
>
> *John Lilley*
>
> Chief Architect, RedPoint Global Inc.****
>
> 1515 Walnut Street *|* Suite 200 *|* Boulder, CO 80302****
>
> T: +1 303 541 1516  *| *M: +1 720 938 5761 *|* F: +1 781-705-2077****
>
> Skype: jlilley.redpoint *|* *john.lilley@redpoint.net* *|*
> www.redpoint.net****
>
> ** **
>

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Yes, thats what this is for. You can implement, pass in and use your
own AuxService. It needs to be on the NodeManager CLASSPATH to run
(and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce to run
> </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>



-- 
Harsh J

Re: yarn-site.xml and aux-services

Posted by Harsh J <ha...@cloudera.com>.

Yes, thats what this is for. You can implement, pass in and use your
own AuxService. It needs to be on the NodeManager CLASSPATH to run
(and NM has to be restarted to apply).

On Wed, Jun 5, 2013 at 4:00 AM, John Lilley <jo...@redpoint.net> wrote:
> I notice the yarn-site.xml
>
>
>
>   <property>
>
>     <name>yarn.nodemanager.aux-services</name>
>
>     <value>mapreduce.shuffle</value>
>
>     <description>shuffle service that needs to be set for Map Reduce to run
> </description>
>
>   </property>
>
>
>
> Is this a general-purpose hook?
>
> Can I tell yarn to run *my* per-node service?
>
> Is there some other way (within the recommended Hadoop framework) to run a
> per-node service that exists during the lifetime of the NodeManager?
>
>
>
> John Lilley
>
> Chief Architect, RedPoint Global Inc.
>
> 1515 Walnut Street | Suite 200 | Boulder, CO 80302
>
> T: +1 303 541 1516  | M: +1 720 938 5761 | F: +1 781-705-2077
>
> Skype: jlilley.redpoint | john.lilley@redpoint.net | www.redpoint.net
>
>



-- 
Harsh J