You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Brian C. Huffman" <bh...@etinternational.com> on 2014/01/29 21:59:53 UTC

Passing data from Client to AM

I'm looking at Distributed Shell as an example for writing a YARN 
application.

My question is why are the script path and associated metadata saved as 
environment variables?  Are there any other ways besides environment 
variables or command line arguments for passing data from the Client to 
the ApplicationMaster?

Thanks,
Brian




Re: Passing data from Client to AM

Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.

-- Hitesh

On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:

> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
> 
> 
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
> 
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
> 
> hth,
> Arun
> 
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> 
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>> 
>> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>> 
>> Thanks,
>> Brian
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> -- 
> Jay Vyas
> http://jayunit100.blogspot.com


Re: Passing data from Client to AM

Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.

-- Hitesh

On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:

> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
> 
> 
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
> 
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
> 
> hth,
> Arun
> 
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> 
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>> 
>> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>> 
>> Thanks,
>> Brian
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> -- 
> Jay Vyas
> http://jayunit100.blogspot.com


Re: Passing data from Client to AM

Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.

-- Hitesh

On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:

> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
> 
> 
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
> 
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
> 
> hth,
> Arun
> 
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> 
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>> 
>> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>> 
>> Thanks,
>> Brian
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> -- 
> Jay Vyas
> http://jayunit100.blogspot.com


Re: Passing data from Client to AM

Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.

-- Hitesh

On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:

> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
> 
> 
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
> 
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
> 
> hth,
> Arun
> 
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> 
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>> 
>> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>> 
>> Thanks,
>> Brian
>> 
>> 
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 
> 
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
> 
> 
> 
> -- 
> Jay Vyas
> http://jayunit100.blogspot.com


Re: Passing data from Client to AM

Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?


On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables?  Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Passing data from Client to AM

Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?


On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables?  Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Passing data from Client to AM

Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?


On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables?  Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Passing data from Client to AM

Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?


On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:

> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables?  Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.




-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Passing data from Client to AM

Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.

A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").

hth,
Arun

On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:

> I'm looking at Distributed Shell as an example for writing a YARN application.
> 
> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
> 
> Thanks,
> Brian
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Passing data from Client to AM

Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.

A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").

hth,
Arun

On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:

> I'm looking at Distributed Shell as an example for writing a YARN application.
> 
> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
> 
> Thanks,
> Brian
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Passing data from Client to AM

Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.

A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").

hth,
Arun

On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:

> I'm looking at Distributed Shell as an example for writing a YARN application.
> 
> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
> 
> Thanks,
> Brian
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: Passing data from Client to AM

Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.

A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").

hth,
Arun

On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:

> I'm looking at Distributed Shell as an example for writing a YARN application.
> 
> My question is why are the script path and associated metadata saved as environment variables?  Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
> 
> Thanks,
> Brian
> 
> 
> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.