You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by "Brian C. Huffman" <bh...@etinternational.com> on 2014/01/29 21:59:53 UTC
Passing data from Client to AM
I'm looking at Distributed Shell as an example for writing a YARN
application.
My question is why are the script path and associated metadata saved as
environment variables? Are there any other ways besides environment
variables or command line arguments for passing data from the Client to
the ApplicationMaster?
Thanks,
Brian
Re: Passing data from Client to AM
Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.
-- Hitesh
On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:
> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
>
>
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
>
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>>
>> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>>
>> Thanks,
>> Brian
>>
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.
-- Hitesh
On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:
> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
>
>
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
>
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>>
>> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>>
>> Thanks,
>> Brian
>>
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.
-- Hitesh
On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:
> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
>
>
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
>
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>>
>> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>>
>> Thanks,
>> Brian
>>
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Hitesh Shah <hi...@apache.org>.
Adding values to a Configuration object does not really work unless you serialize the config into a file and send it over to the AM and containers as a local resource. The application code would then need to load in this file using Configuration::addResource(). MapReduce does this by taking in all user configured values and serializing them in the form of job.xml.
-- Hitesh
On Jan 29, 2014, at 2:42 PM, Jay Vyas wrote:
> while your at it, what about adding values to the Configuration() object, does that still work as a hack for information passing?
>
>
> On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
>
>> I'm looking at Distributed Shell as an example for writing a YARN application.
>>
>> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>>
>> Thanks,
>> Brian
>>
>>
>>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
>
>
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?
On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables? Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?
On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables? Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?
On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables? Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Jay Vyas <ja...@gmail.com>.
while your at it, what about adding values to the Configuration() object,
does that still work as a hack for information passing?
On Wed, Jan 29, 2014 at 5:25 PM, Arun C Murthy <ac...@hortonworks.com> wrote:
> Command line arguments & env variables are the most direct options.
>
> A more onerous option is to write some data to a file in HDFS, use
> LocalResource to ship it to the container on each node and get application
> code to read that file locally. (In MRv1 parlance that is "Distributed
> Cache").
>
> hth,
> Arun
>
> On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <
> bhuffman@etinternational.com> wrote:
>
> I'm looking at Distributed Shell as an example for writing a YARN
> application.
>
> My question is why are the script path and associated metadata saved as
> environment variables? Are there any other ways besides environment
> variables or command line arguments for passing data from the Client to the
> ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
>
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
--
Jay Vyas
http://jayunit100.blogspot.com
Re: Passing data from Client to AM
Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.
A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
hth,
Arun
On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> I'm looking at Distributed Shell as an example for writing a YARN application.
>
> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: Passing data from Client to AM
Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.
A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
hth,
Arun
On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> I'm looking at Distributed Shell as an example for writing a YARN application.
>
> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: Passing data from Client to AM
Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.
A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
hth,
Arun
On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> I'm looking at Distributed Shell as an example for writing a YARN application.
>
> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.
Re: Passing data from Client to AM
Posted by Arun C Murthy <ac...@hortonworks.com>.
Command line arguments & env variables are the most direct options.
A more onerous option is to write some data to a file in HDFS, use LocalResource to ship it to the container on each node and get application code to read that file locally. (In MRv1 parlance that is "Distributed Cache").
hth,
Arun
On Jan 29, 2014, at 12:59 PM, Brian C. Huffman <bh...@etinternational.com> wrote:
> I'm looking at Distributed Shell as an example for writing a YARN application.
>
> My question is why are the script path and associated metadata saved as environment variables? Are there any other ways besides environment variables or command line arguments for passing data from the Client to the ApplicationMaster?
>
> Thanks,
> Brian
>
>
>
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to
which it is addressed and may contain information that is confidential,
privileged and exempt from disclosure under applicable law. If the reader
of this message is not the intended recipient, you are hereby notified that
any printing, copying, dissemination, distribution, disclosure or
forwarding of this communication is strictly prohibited. If you have
received this communication in error, please contact the sender immediately
and delete it from your system. Thank You.