You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Henning Blohm <he...@zfabrik.de> on 2010/09/17 13:56:09 UTC

Too large class path for map reduce jobs

When running map reduce tasks in Hadoop I run into classpath issues.
Contrary to previous posts, my problem is not that I am missing classes
on the Task's class path (we have a perfect solution for that) but
rather find too many (e.g. ECJ classes or jetty).

The libs in HADOOP_HOME/lib seem to contain everything needed to run
anything in Hadoop which is, I assume, much more than is needed to run a
map reduce task.

Is there a doable way of taking just those needed for map reduce jobs
and have the class path for m/r tasks point to just those?

I.e.: What would that set of libs comprise and where to specify the
class path for m/r tasks?

Thanks,
  Henning


Re: Too large class path for map reduce jobs

Posted by Henning Blohm <he...@zfabrik.de>.
Not really. "Anything in Hadoop" was really meant to say just that. 

The way we want to run tasks is with integrated provisioning of
everything needed (using www.z2-environment.eu ). So effectively a
Hadoop task loads the provisioning capability in process and then runs
the actual task implementation as provisioned (from another repository
actually), so that we do not need to have a special build process for
Hadoop Jobs.

However, everything on the class path of the hadoop task is visible to
the code of the z2 system and the task implementation and may lead to
conflict with other code. Specifically the Java compiler implementation
that is on the Hadoop class path (due to the use of Jasper) conflicts
with the one we use. That's why we would like to run Hadoop tasks
without unnecessary stuff (e.g. Jasper) on the class path.

Thanks,
  Henning

Am Freitag, den 17.09.2010, 16:01 +0000 schrieb Allen Wittenauer:

> On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:
> 
> > When running map reduce tasks in Hadoop I run into classpath issues. Contrary to previous posts, my problem is not that I am missing classes on the Task's class path (we have a perfect solution for that) but rather find too many (e.g. ECJ classes or jetty).
> 
> The fact that you mention:
> 
> > The libs in HADOOP_HOME/lib seem to contain everything needed to run anything in Hadoop which is, I assume, much more than is needed to run a map reduce task.
> 
> hints that your perfect solution is to throw all your custom stuff in lib.  If so, that's a huge mistake.  Use distributed cache instead.

Re: Too large class path for map reduce jobs

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Sep 17, 2010, at 4:56 AM, Henning Blohm wrote:

> When running map reduce tasks in Hadoop I run into classpath issues. Contrary to previous posts, my problem is not that I am missing classes on the Task's class path (we have a perfect solution for that) but rather find too many (e.g. ECJ classes or jetty).

The fact that you mention:

> The libs in HADOOP_HOME/lib seem to contain everything needed to run anything in Hadoop which is, I assume, much more than is needed to run a map reduce task.

hints that your perfect solution is to throw all your custom stuff in lib.  If so, that's a huge mistake.  Use distributed cache instead.