You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/08/22 20:01:23 UTC

Making sure I understand HADOOP_CLASSPATH

What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

This isn't clear to me from documentation and books, so I did some
experimenting. Here's the conclusion I came to: the paths in
HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
not added to the class path of the Task Trackers. Therefore if you put a JAR
called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
available to the Task Trackers as well, calls to MyJar.jar code from the
run() method of your job work, but calls from your Mapper or Reducer will
fail at runtime. Is this correct?

If it is, what is the proper way to make MyJar.jar available to both the Job
Client and the Task Trackers?

Re: Making sure I understand HADOOP_CLASSPATH

Posted by "W.P. McNeill" <bi...@gmail.com>.
I meant tasks running on the Task Trackers.

Harsh J.'s answer is what I needed. This makes sense now.

On Mon, Aug 22, 2011 at 11:06 AM, John Armstrong <jo...@ccri.com>wrote:

> On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" <bi...@gmail.com>
> wrote:
> > If it is, what is the proper way to make MyJar.jar available to both the
> > Job
> > Client and the Task Trackers?
>
> Do you mean the task trackers, or the tasks themselves?  What process do
> you want to be able to run the code in MyJar.jar?
>

Re: Making sure I understand HADOOP_CLASSPATH

Posted by John Armstrong <jo...@ccri.com>.
On Mon, 22 Aug 2011 11:01:23 -0700, "W.P. McNeill" <bi...@gmail.com>
wrote:
> If it is, what is the proper way to make MyJar.jar available to both the
> Job
> Client and the Task Trackers?

Do you mean the task trackers, or the tasks themselves?  What process do
you want to be able to run the code in MyJar.jar?

Re: Making sure I understand HADOOP_CLASSPATH

Posted by Harsh J <ha...@cloudera.com>.
On Mon, Aug 22, 2011 at 11:31 PM, W.P. McNeill <bi...@gmail.com> wrote:
> What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?
>
> This isn't clear to me from documentation and books, so I did some
> experimenting. Here's the conclusion I came to: the paths in
> HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
> not added to the class path of the Task Trackers. Therefore if you put a JAR
> called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
> available to the Task Trackers as well, calls to MyJar.jar code from the
> run() method of your job work, but calls from your Mapper or Reducer will
> fail at runtime. Is this correct?

Yes, this is right.

> If it is, what is the proper way to make MyJar.jar available to both the Job
> Client and the Task Trackers?

You'll need to use the Distributed Cache. Or you'd need to start the
TaskTrackers with the library on their classpath (which copies over to
launched task JVMs). The latter way is rigid/inflexible when it comes
to jar versioning.

-- 
Harsh J

RE: Making sure I understand HADOOP_CLASSPATH

Posted by "GOEKE, MATTHEW (AG/1000)" <ma...@monsanto.com>.
If you are asking how to make those classes available at run time you can either use the -libjars command for the distributed cache or you can just shade those classes into your jar using maven. I have had enough issues in the past with classpath being flaky that I prefer the shading method but obviously that is not the preferred route.

Matt

-----Original Message-----
From: W.P. McNeill [mailto:billmcn@gmail.com] 
Sent: Monday, August 22, 2011 1:01 PM
To: common-user@hadoop.apache.org
Subject: Making sure I understand HADOOP_CLASSPATH

What does HADOOP_CLASSPATH set in $HADOOP/conf/hadoop-env.sh do?

This isn't clear to me from documentation and books, so I did some
experimenting. Here's the conclusion I came to: the paths in
HADOOP_CLASSPATH are added to the class path of the Job Client, but they are
not added to the class path of the Task Trackers. Therefore if you put a JAR
called MyJar.jar on the HADOOP_CLASSPATH and don't do anything to make it
available to the Task Trackers as well, calls to MyJar.jar code from the
run() method of your job work, but calls from your Mapper or Reducer will
fail at runtime. Is this correct?

If it is, what is the proper way to make MyJar.jar available to both the Job
Client and the Task Trackers?
This e-mail message may contain privileged and/or confidential information, and is intended to be received only by persons entitled
to receive such information. If you have received this e-mail in error, please notify the sender immediately. Please delete it and
all attachments from any servers, hard drives or any other media. Other use of this e-mail by you is strictly prohibited.

All e-mails and attachments sent and received are subject to monitoring, reading and archival by Monsanto, including its
subsidiaries. The recipient of this e-mail is solely responsible for checking for the presence of "Viruses" or other "Malware".
Monsanto, along with its subsidiaries, accepts no liability for any damage caused by any such code transmitted by or accompanying
this e-mail or any attachment.


The information contained in this email may be subject to the export control laws and regulations of the United States, potentially
including but not limited to the Export Administration Regulations (EAR) and sanctions regulations issued by the U.S. Department of
Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this information you are obligated to comply with all
applicable U.S. export laws and regulations.