You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by rab ra <ra...@gmail.com> on 2015/01/16 19:15:45 UTC

simple hadoop MR program to be executed using java

Hello,

I have a simple java program that sets up a MR job. I could successfully
execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
<myjar>'. But I want to achieve the same thing using java command as below.

java <className>

1. How can I pass hadoop configuration to this className?
2. What extra arguments do I need to supply?
3. Any link/documentation would be highly appreciated.


regards
rab

Re: simple hadoop MR program to be executed using java

Posted by rab ra <ra...@gmail.com>.
On Sat, Jan 17, 2015 at 12:33 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Rab,
>
> There is actually quite a lot of logic in the "hadoop jar" shell scripts
> to set up the classpath (including Hadoop configuration file locations) and
> set up extra arguments (like heap sizes and log file locations).  It is
> possible to replicate it with a straight java call, but it might not be
> worth the effort, and end users of your jar would lose functionality
> implemented in the shell scripts, such as configuration file location
> overrides.
>
> If you still want to pursue this, then you might want to make a small
> change to the "hadoop jar" script and add a line right before the java call
> to echo the command it's running.  That will give you a sense for the java
> command that ultimately gets run.  You could also take a look at the
> process table for a running "hadoop jar" process and inspect its command
> line and environment variables.
>
> Another potentially helpful tool  is the "hadoop classpath" command:
>
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath
>
> This uses the full logic of the shell scripts for classpath construction,
> but then just echoes it instead of using it to run a jar.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> Hello,

Thanks for your response. I had a feeling that if a web application needs
to process a request from the client and subsequently span MR jobs, it
would not span command line process using 'hadoop' command and there would
be a way to instantiate a hadoop driver class that contains Mapper and
reducer. In this setup, I expected there would be a place where all the
hadoop related configuration / jars would be placed so that they are
available for hadoop job. Hence, asked this question. I thought it is
straightforward and many people would have attempted it and hence getting
help in the form of documentation and blog would not be problem. I spent
two days in this but still could not find a way to do this.
'
regards
rab



> On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a simple java program that sets up a MR job. I could successfully
>> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
>> <myjar>'. But I want to achieve the same thing using java command as below.
>>
>> java <className>
>>
>> 1. How can I pass hadoop configuration to this className?
>> 2. What extra arguments do I need to supply?
>> 3. Any link/documentation would be highly appreciated.
>>
>>
>> regards
>> rab
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by rab ra <ra...@gmail.com>.
On Sat, Jan 17, 2015 at 12:33 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Rab,
>
> There is actually quite a lot of logic in the "hadoop jar" shell scripts
> to set up the classpath (including Hadoop configuration file locations) and
> set up extra arguments (like heap sizes and log file locations).  It is
> possible to replicate it with a straight java call, but it might not be
> worth the effort, and end users of your jar would lose functionality
> implemented in the shell scripts, such as configuration file location
> overrides.
>
> If you still want to pursue this, then you might want to make a small
> change to the "hadoop jar" script and add a line right before the java call
> to echo the command it's running.  That will give you a sense for the java
> command that ultimately gets run.  You could also take a look at the
> process table for a running "hadoop jar" process and inspect its command
> line and environment variables.
>
> Another potentially helpful tool  is the "hadoop classpath" command:
>
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath
>
> This uses the full logic of the shell scripts for classpath construction,
> but then just echoes it instead of using it to run a jar.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> Hello,

Thanks for your response. I had a feeling that if a web application needs
to process a request from the client and subsequently span MR jobs, it
would not span command line process using 'hadoop' command and there would
be a way to instantiate a hadoop driver class that contains Mapper and
reducer. In this setup, I expected there would be a place where all the
hadoop related configuration / jars would be placed so that they are
available for hadoop job. Hence, asked this question. I thought it is
straightforward and many people would have attempted it and hence getting
help in the form of documentation and blog would not be problem. I spent
two days in this but still could not find a way to do this.
'
regards
rab



> On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a simple java program that sets up a MR job. I could successfully
>> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
>> <myjar>'. But I want to achieve the same thing using java command as below.
>>
>> java <className>
>>
>> 1. How can I pass hadoop configuration to this className?
>> 2. What extra arguments do I need to supply?
>> 3. Any link/documentation would be highly appreciated.
>>
>>
>> regards
>> rab
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by rab ra <ra...@gmail.com>.
On Sat, Jan 17, 2015 at 12:33 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Rab,
>
> There is actually quite a lot of logic in the "hadoop jar" shell scripts
> to set up the classpath (including Hadoop configuration file locations) and
> set up extra arguments (like heap sizes and log file locations).  It is
> possible to replicate it with a straight java call, but it might not be
> worth the effort, and end users of your jar would lose functionality
> implemented in the shell scripts, such as configuration file location
> overrides.
>
> If you still want to pursue this, then you might want to make a small
> change to the "hadoop jar" script and add a line right before the java call
> to echo the command it's running.  That will give you a sense for the java
> command that ultimately gets run.  You could also take a look at the
> process table for a running "hadoop jar" process and inspect its command
> line and environment variables.
>
> Another potentially helpful tool  is the "hadoop classpath" command:
>
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath
>
> This uses the full logic of the shell scripts for classpath construction,
> but then just echoes it instead of using it to run a jar.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> Hello,

Thanks for your response. I had a feeling that if a web application needs
to process a request from the client and subsequently span MR jobs, it
would not span command line process using 'hadoop' command and there would
be a way to instantiate a hadoop driver class that contains Mapper and
reducer. In this setup, I expected there would be a place where all the
hadoop related configuration / jars would be placed so that they are
available for hadoop job. Hence, asked this question. I thought it is
straightforward and many people would have attempted it and hence getting
help in the form of documentation and blog would not be problem. I spent
two days in this but still could not find a way to do this.
'
regards
rab



> On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a simple java program that sets up a MR job. I could successfully
>> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
>> <myjar>'. But I want to achieve the same thing using java command as below.
>>
>> java <className>
>>
>> 1. How can I pass hadoop configuration to this className?
>> 2. What extra arguments do I need to supply?
>> 3. Any link/documentation would be highly appreciated.
>>
>>
>> regards
>> rab
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by rab ra <ra...@gmail.com>.
On Sat, Jan 17, 2015 at 12:33 AM, Chris Nauroth <cn...@hortonworks.com>
wrote:

> Hello Rab,
>
> There is actually quite a lot of logic in the "hadoop jar" shell scripts
> to set up the classpath (including Hadoop configuration file locations) and
> set up extra arguments (like heap sizes and log file locations).  It is
> possible to replicate it with a straight java call, but it might not be
> worth the effort, and end users of your jar would lose functionality
> implemented in the shell scripts, such as configuration file location
> overrides.
>
> If you still want to pursue this, then you might want to make a small
> change to the "hadoop jar" script and add a line right before the java call
> to echo the command it's running.  That will give you a sense for the java
> command that ultimately gets run.  You could also take a look at the
> process table for a running "hadoop jar" process and inspect its command
> line and environment variables.
>
> Another potentially helpful tool  is the "hadoop classpath" command:
>
>
> http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath
>
> This uses the full logic of the shell scripts for classpath construction,
> but then just echoes it instead of using it to run a jar.
>
> Chris Nauroth
> Hortonworks
> http://hortonworks.com/
>
>
> Hello,

Thanks for your response. I had a feeling that if a web application needs
to process a request from the client and subsequently span MR jobs, it
would not span command line process using 'hadoop' command and there would
be a way to instantiate a hadoop driver class that contains Mapper and
reducer. In this setup, I expected there would be a place where all the
hadoop related configuration / jars would be placed so that they are
available for hadoop job. Hence, asked this question. I thought it is
straightforward and many people would have attempted it and hence getting
help in the form of documentation and blog would not be problem. I spent
two days in this but still could not find a way to do this.
'
regards
rab



> On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:
>
>> Hello,
>>
>> I have a simple java program that sets up a MR job. I could successfully
>> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
>> <myjar>'. But I want to achieve the same thing using java command as below.
>>
>> java <className>
>>
>> 1. How can I pass hadoop configuration to this className?
>> 2. What extra arguments do I need to supply?
>> 3. Any link/documentation would be highly appreciated.
>>
>>
>> regards
>> rab
>>
>>
>>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Rab,

There is actually quite a lot of logic in the "hadoop jar" shell scripts to
set up the classpath (including Hadoop configuration file locations) and
set up extra arguments (like heap sizes and log file locations).  It is
possible to replicate it with a straight java call, but it might not be
worth the effort, and end users of your jar would lose functionality
implemented in the shell scripts, such as configuration file location
overrides.

If you still want to pursue this, then you might want to make a small
change to the "hadoop jar" script and add a line right before the java call
to echo the command it's running.  That will give you a sense for the java
command that ultimately gets run.  You could also take a look at the
process table for a running "hadoop jar" process and inspect its command
line and environment variables.

Another potentially helpful tool  is the "hadoop classpath" command:

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath

This uses the full logic of the shell scripts for classpath construction,
but then just echoes it instead of using it to run a jar.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:

> Hello,
>
> I have a simple java program that sets up a MR job. I could successfully
> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
> <myjar>'. But I want to achieve the same thing using java command as below.
>
> java <className>
>
> 1. How can I pass hadoop configuration to this className?
> 2. What extra arguments do I need to supply?
> 3. Any link/documentation would be highly appreciated.
>
>
> regards
> rab
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Rab,

There is actually quite a lot of logic in the "hadoop jar" shell scripts to
set up the classpath (including Hadoop configuration file locations) and
set up extra arguments (like heap sizes and log file locations).  It is
possible to replicate it with a straight java call, but it might not be
worth the effort, and end users of your jar would lose functionality
implemented in the shell scripts, such as configuration file location
overrides.

If you still want to pursue this, then you might want to make a small
change to the "hadoop jar" script and add a line right before the java call
to echo the command it's running.  That will give you a sense for the java
command that ultimately gets run.  You could also take a look at the
process table for a running "hadoop jar" process and inspect its command
line and environment variables.

Another potentially helpful tool  is the "hadoop classpath" command:

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath

This uses the full logic of the shell scripts for classpath construction,
but then just echoes it instead of using it to run a jar.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:

> Hello,
>
> I have a simple java program that sets up a MR job. I could successfully
> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
> <myjar>'. But I want to achieve the same thing using java command as below.
>
> java <className>
>
> 1. How can I pass hadoop configuration to this className?
> 2. What extra arguments do I need to supply?
> 3. Any link/documentation would be highly appreciated.
>
>
> regards
> rab
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Rab,

There is actually quite a lot of logic in the "hadoop jar" shell scripts to
set up the classpath (including Hadoop configuration file locations) and
set up extra arguments (like heap sizes and log file locations).  It is
possible to replicate it with a straight java call, but it might not be
worth the effort, and end users of your jar would lose functionality
implemented in the shell scripts, such as configuration file location
overrides.

If you still want to pursue this, then you might want to make a small
change to the "hadoop jar" script and add a line right before the java call
to echo the command it's running.  That will give you a sense for the java
command that ultimately gets run.  You could also take a look at the
process table for a running "hadoop jar" process and inspect its command
line and environment variables.

Another potentially helpful tool  is the "hadoop classpath" command:

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath

This uses the full logic of the shell scripts for classpath construction,
but then just echoes it instead of using it to run a jar.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:

> Hello,
>
> I have a simple java program that sets up a MR job. I could successfully
> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
> <myjar>'. But I want to achieve the same thing using java command as below.
>
> java <className>
>
> 1. How can I pass hadoop configuration to this className?
> 2. What extra arguments do I need to supply?
> 3. Any link/documentation would be highly appreciated.
>
>
> regards
> rab
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Re: simple hadoop MR program to be executed using java

Posted by Chris Nauroth <cn...@hortonworks.com>.
Hello Rab,

There is actually quite a lot of logic in the "hadoop jar" shell scripts to
set up the classpath (including Hadoop configuration file locations) and
set up extra arguments (like heap sizes and log file locations).  It is
possible to replicate it with a straight java call, but it might not be
worth the effort, and end users of your jar would lose functionality
implemented in the shell scripts, such as configuration file location
overrides.

If you still want to pursue this, then you might want to make a small
change to the "hadoop jar" script and add a line right before the java call
to echo the command it's running.  That will give you a sense for the java
command that ultimately gets run.  You could also take a look at the
process table for a running "hadoop jar" process and inspect its command
line and environment variables.

Another potentially helpful tool  is the "hadoop classpath" command:

http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/CommandsManual.html#classpath

This uses the full logic of the shell scripts for classpath construction,
but then just echoes it instead of using it to run a jar.

Chris Nauroth
Hortonworks
http://hortonworks.com/


On Fri, Jan 16, 2015 at 10:15 AM, rab ra <ra...@gmail.com> wrote:

> Hello,
>
> I have a simple java program that sets up a MR job. I could successfully
> execute this in Hadoop infrastructure (hadoop 2x) using 'hadoop jar
> <myjar>'. But I want to achieve the same thing using java command as below.
>
> java <className>
>
> 1. How can I pass hadoop configuration to this className?
> 2. What extra arguments do I need to supply?
> 3. Any link/documentation would be highly appreciated.
>
>
> regards
> rab
>
>
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.