You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by John Armstrong <jo...@ccri.com> on 2011/05/26 16:45:28 UTC

Problems adding JARs to distributed classpath in Hadoop 0.20.2

Hi, everybody.

I'm running into some difficulties getting needed libraries to map/reduce
tasks using the distributed cache.

I'm using Hadoop 0.20.2, which from what I can tell is a hard requirement
by the client, so more current versions are not really viable options.

The code I've inherited is Java, which sets up and runs the MR job. 
There's currently some nontrivial pre- and post-processing, so it will be a
large refactoring before I can just run bare MR jobs rather than starting
them through Java.

Further complicating matters: in practice the Java jobs are launched by
Oozie, which of course does so by wrapping each one in a MR shell.  The
upshot is that I don't have any control over which "local" filesystem the
Java job is run from, though if local files are absolutely needed I can
make my Java wrappers copy stuff back from HDFS to the Java job's local
filesystem.

So here's the problem

mappers and/or reducers need class Needed, which is contained in
needed-1.0.jar, which is in HDFS:
    hdfs://.../libdir/distributed/needed-1.0.jar

Java program executes:
    DistributedCache.addFiletoClassPath(new
Path("hdfs://.../libdir/distributed/needed-1.0.jar"),job.getConfiguration());

Inspecting the Job object I find the file has been added to the cache
files as expected:
    job.conf.overlay[...] = mapred.cache.files ->
hdfs://.../libdir/distributed/needed-1.0.jar
    job.conf.properties[...] = mapred.cache.files ->
hdfs://.../libdir/distributed/needed-1.0.jar

And the class seems to show up in the internal ClassLoader:
    job.conf.classLoader.classes[...] = "class my.class.package.Needed"

though this may just be inherited from the ClassLoader of the Java process
itself (which also uses Needed).

And yet as soon as I get into the mapreduce job itself I start getting:

2011-05-25 17:22:56,080  INFO JobClient - Task Id :
attempt_201105251330_0037_r_000043_0, Status : FAILED
java.lang.RuntimeException: java.lang.ClassNotFoundException:
my.class.package.Needed

Up until this point we've run things by having a directory on each node
containing all the libraries we'd need, and including that in the Hadoop
classpath, but we have no such control in the deployment scenario, so we
have to make our program hand the needed libraries to the map and reduce
nodes via the distributed cache classpath.

Thanks in advance for any insight or assistance you can offer.

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Wed, 1 Jun 2011 12:48:51 -0700, Alejandro Abdelnur <tu...@cloudera.com>
wrote:

> Do you have all JARs used by your classes in Needed.jar in the DC
classpath
> as well?

needed.jar contains the class Needed, which my mappers need.  If the class
Needed calls for another class AlsoNeeded in another jar, wouldn't I get a
ClassNotFoundException for AlsoNeeded?

> Are you propagating the delegation token?

Now we're getting somewhere: I don't have any idea what you mean by this. 
If this is something I need to be doing to get this technique to work, I'd
love to see a reference teaching me how to do it.

Thanks again.

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

John,

Do you have all JARs used by your classes in Needed.jar in the DC classpath
as well?

Are you propagating the delegation token?

Thxs.

Alejandro

On Wed, Jun 1, 2011 at 12:38 PM, John Armstrong <jo...@ccri.com>wrote:

> On Tue, 31 May 2011 15:09:28 -0400, John Armstrong
> <jo...@ccri.com> wrote:
> > On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur
> <tu...@cloudera.com>
> > wrote:
> >> What is exactly that does not work?
>
> In the hopes that more information can help, I've dug into the local
> filesystems on each of my four nodes and retrieved the job.xml and the
> locations of the files to show that everything shows up where it should.
>
> In this example have one regular file
> (hdfs://node1:hdfsport/hdfs/path/to/file1.foo) added with
> DistributedCache.addCacheFile().  I also have a JAR
> (hdfs://node1:hdfsport/hdfs/path/to/needed.jar) added with
> DistributedCache.addFileToClassPath().  The needed JAR is also part of the
> classpath Oozie provides to my Java task.
>
> As you can see, both files (with correct filesizes and timestamps) are
> listed as cache files in job.xml, and the JAR is listed as a classpath
> file.  Both files show up on each node; the JAR shows up twice on node 1
> since that's where Oozie ran the Java task, and thus where Oozie placed the
> JAR with its own use of the distributed cache.
>
> And yet, when mapreduce actually tries to run the job my Java task
> launches, it immediately hits a ClassNotFoundException, claiming it can't
> find the class my.class.package.Needed which is contained in needed.jar.
>
> JOB.XML
> ...
>    <property>
>        <!--Loaded from Unknown-->
>        <name>mapred.job.classpath.files</name>
>        <value>hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
>    </property>
> ...
>    <property>
>        <!--Loaded from Unknown-->
>        <name>mapred.cache.files</name>
>
>
> <value>hdfs://node1:hdfsport/hdfs/path/to/file1.foo,hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
>    </property>
> ...
>    <property>
>        <!--Loaded from Unknown-->
>        <name>mapred.cache.files.filesizes</name>
>        <value>61175,2257057</value>
>    </property>
> ...
>    <property>
>        <!--Loaded from Unknown-->
>        <name>mapred.cache.files.timestamps</name>
>        <value>1306949104866,1306949371660</value>
>    </property>
> ...
>
> NODE 1 LOCAL FILESYSTEM
>
> /data/4/mapred/local/taskTracker/distcache/5181540010607464671_-132008737_1279047490/node1/hdfs/path/to/file1.foo
>
> /data/1/mapred/local/taskTracker/distcache/6423795395825083633_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
>
> /data/3/mapred/local/taskTracker/distcache/2424191142954514770_1281905983_1269665052/node1/hdfs/path/to/needed.jar
>
> NODE 2 LOCAL FILESYSTEM
>
> /data/1/mapred/local/taskTracker/distcache/-1458632814086969626_-132008737_1279047490/node1/hdfs/path/to/file1.foo
>
> /data/2/mapred/local/taskTracker/distcache/4434671176913378591_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
>
> NODE 3 LOCAL FILESYSTEM
>
> /data/1/mapred/local/taskTracker/distcache/-6763452370915390695_-132008737_1279047490/node1/hdfs/path/to/file1.foo
>
> /data/2/mapred/local/taskTracker/distcache/6838381597046551111_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
>
> NODE 4 LOCAL FILESYSTEM
>
> /data/1/mapred/local/taskTracker/distcache/-1759547009148985681_-132008737_1279047490/node1/hdfs/path/to/file1.foo
>
> /data/2/mapred/local/taskTracker/distcache/1998811135309473771_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
>
> SAMPLE MAPPER ATTEMPT LOG
>
> 2011-06-01 14:21:41,442 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2011-06-01 14:21:41,557 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
> symlink:
>
> /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/job.jar
> <-
>
> /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./job.jar
> 2011-06-01 14:21:41,560 INFO
> org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
> symlink:
>
> /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/.job.jar.crc
> <-
>
> /data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./.job.jar.crc
> 2011-06-01 14:21:41,563 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2011-06-01 14:21:41,660 WARN org.apache.hadoop.mapred.Child: Error running
> child
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> my.class.package.Needed
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:973)
>        at
>
> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:236)
>        at org.apache.hadoop.mapred.Task.initialize(Task.java:484)
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:298)
>        at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at javax.security.auth.Subject.doAs(Subject.java:396)
>        at
>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
>        at org.apache.hadoop.mapred.Child.main(Child.java:211)
> Caused by: java.lang.ClassNotFoundException: my.class.package.Needed
>        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>        at java.security.AccessController.doPrivileged(Native Method)
>        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>        at java.lang.Class.forName0(Native Method)
>        at java.lang.Class.forName(Class.java:247)
>        at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:920)
>        at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:971)
>        ... 8 more
>
>

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Tue, 31 May 2011 15:09:28 -0400, John Armstrong
<jo...@ccri.com> wrote:
> On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur
<tu...@cloudera.com>
> wrote:
>> What is exactly that does not work?

In the hopes that more information can help, I've dug into the local
filesystems on each of my four nodes and retrieved the job.xml and the
locations of the files to show that everything shows up where it should.

In this example have one regular file
(hdfs://node1:hdfsport/hdfs/path/to/file1.foo) added with
DistributedCache.addCacheFile().  I also have a JAR
(hdfs://node1:hdfsport/hdfs/path/to/needed.jar) added with
DistributedCache.addFileToClassPath().  The needed JAR is also part of the
classpath Oozie provides to my Java task.

As you can see, both files (with correct filesizes and timestamps) are
listed as cache files in job.xml, and the JAR is listed as a classpath
file.  Both files show up on each node; the JAR shows up twice on node 1
since that's where Oozie ran the Java task, and thus where Oozie placed the
JAR with its own use of the distributed cache.

And yet, when mapreduce actually tries to run the job my Java task
launches, it immediately hits a ClassNotFoundException, claiming it can't
find the class my.class.package.Needed which is contained in needed.jar.

JOB.XML
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.job.classpath.files</name>
        <value>hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files</name>
       
<value>hdfs://node1:hdfsport/hdfs/path/to/file1.foo,hdfs://node1:hdfsport/hdfs/path/to/needed.jar</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files.filesizes</name>
        <value>61175,2257057</value>
    </property>
...
    <property>
        <!--Loaded from Unknown-->
        <name>mapred.cache.files.timestamps</name>
        <value>1306949104866,1306949371660</value>
    </property>
...

NODE 1 LOCAL FILESYSTEM
/data/4/mapred/local/taskTracker/distcache/5181540010607464671_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/1/mapred/local/taskTracker/distcache/6423795395825083633_-1942178119_1279314284/node1/hdfs/path/to/needed.jar
/data/3/mapred/local/taskTracker/distcache/2424191142954514770_1281905983_1269665052/node1/hdfs/path/to/needed.jar

NODE 2 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-1458632814086969626_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/4434671176913378591_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

NODE 3 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-6763452370915390695_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/6838381597046551111_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

NODE 4 LOCAL FILESYSTEM
/data/1/mapred/local/taskTracker/distcache/-1759547009148985681_-132008737_1279047490/node1/hdfs/path/to/file1.foo
/data/2/mapred/local/taskTracker/distcache/1998811135309473771_-1942178119_1279314284/node1/hdfs/path/to/needed.jar

SAMPLE MAPPER ATTEMPT LOG

2011-06-01 14:21:41,442 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2011-06-01 14:21:41,557 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/job.jar
<-
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./job.jar
2011-06-01 14:21:41,560 INFO
org.apache.hadoop.filecache.TrackerDistributedCacheManager: Creating
symlink:
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/jars/.job.jar.crc
<-
/data/2/mapred/local/taskTracker/hdfs/jobcache/job_201106011430_0002/attempt_201106011430_0002_m_000009_0/work/./.job.jar.crc
2011-06-01 14:21:41,563 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
Initializing JVM Metrics with processName=MAP, sessionId=
2011-06-01 14:21:41,660 WARN org.apache.hadoop.mapred.Child: Error running
child
java.lang.RuntimeException: java.lang.ClassNotFoundException:
my.class.package.Needed
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:973)
	at
org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:236)
	at org.apache.hadoop.mapred.Task.initialize(Task.java:484)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:298)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:217)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1063)
	at org.apache.hadoop.mapred.Child.main(Child.java:211)
Caused by: java.lang.ClassNotFoundException: my.class.package.Needed
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:247)
	at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:920)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:971)
	... 8 more

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Tue, 31 May 2011 12:02:28 -0700, Alejandro Abdelnur <tu...@cloudera.com>
wrote:
> What is exactly that does not work?

Oozie launches a wrapper MapReduce job to run a Java job J1.  Oozie's
/lib/ directory is provided to the classpath of J1 as expected.  This part
works.

The Java job J1 configures and launches a MapReduce job MR1.  As part of
the configuration, J1 tries to put some JARs on the distributed classpath
for MR1 to use in its mappers and reducers.  To do so, it calls
DistributedCache.addFileToClassPath(jarfilePath).  The file at jarfilePath
DOES get added to the distributed cache.  But the mapper for MR1 still
throws a ClassNotFoundException, since the file at jarfilePath is NOT on
the classpath for MR1.  This is what doesn't work.

I hope this explanation makes more sense.  Thanks again for putting some
thought to it.

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

What is exactly that does not work?

Oozie uses DistributeCache as the only mechanism to set classpaths to jobs
and it works fine.

Thanks.

Alejandro

On Mon, May 30, 2011 at 10:22 AM, John Armstrong <jo...@ccri.com>wrote:

> On Mon, 30 May 2011 09:43:14 -0700, Alejandro Abdelnur <tu...@cloudera.com>
> wrote:
> > If you still want to start your MR job from your Java action, then your
> > Java
> > action should do all the setup the MapReduceMain class does before
> starting
> > the MR job (this will ensure delegation tokens and distributed cache is
> > avail to your MR job).
>
> Yes, my Java action is doing the setup work.  In particular, it calls
> DistrributedCache.addfileToClassPath(), which (according to the
> documentation) should be the same as passing it in at the command line with
> -libjars, right?  And yet it doesn't seem to work.
>
> Is this the same as the JIRA issue MAPREDUCE-752?  And if so, does this
> mean that there is no solution (other than a workaround like passing a fat
> JAR) that doesn't involve patching the Hadoop code itself (which I'd have
> to get our client to agree to)?
>

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Mon, 30 May 2011 09:43:14 -0700, Alejandro Abdelnur <tu...@cloudera.com>
wrote:
> If you still want to start your MR job from your Java action, then your
> Java
> action should do all the setup the MapReduceMain class does before
starting
> the MR job (this will ensure delegation tokens and distributed cache is
> avail to your MR job).

Yes, my Java action is doing the setup work.  In particular, it calls
DistrributedCache.addfileToClassPath(), which (according to the
documentation) should be the same as passing it in at the command line with
-libjars, right?  And yet it doesn't seem to work.

Is this the same as the JIRA issue MAPREDUCE-752?  And if so, does this
mean that there is no solution (other than a workaround like passing a fat
JAR) that doesn't involve patching the Hadoop code itself (which I'd have
to get our client to agree to)?

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

John,

Now I get what you are trying to do.

My recommendation would be:

* Use a Java action to do all the stuff prior to starting your MR job
* Use a mapreduce action to start your MR job
* If you need to propagate properties from the Java action to the MR action
you can use the <capture-output> flag.

If you still want to start your MR job from your Java action, then your Java
action should do all the setup the MapReduceMain class does before starting
the MR job (this will ensure delegation tokens and distributed cache is
avail to your MR job).

Thanks.

Alejandro

On Mon, May 30, 2011 at 6:34 AM, John Armstrong <jo...@ccri.com>wrote:

> On Fri, 27 May 2011 15:47:23 -0700, Alejandro Abdelnur <tu...@cloudera.com>
> wrote:
> > John,
> >
> > If you are using Oozie, dropping all the JARs your MR jobs needs in the
> > Oozie WF lib/ directory should suffice. Oozie will make sure all those
> JARs
> > are in the distributed cache.
>
> That doesn't seem to work.  I have this JAR in the WF /lib/ directory
> because the Java job that launches the MR job needs it.  And yes, it's in
> the distributed cache for the wrapper MR job that Oozie uses to remotely
> run the Java job.  The problem is it's not available for the MR job that
> the Java job launches.
>
> Thanks, though, for the suggestion.
>

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Fri, 27 May 2011 15:47:23 -0700, Alejandro Abdelnur <tu...@cloudera.com>
wrote:
> John,
> 
> If you are using Oozie, dropping all the JARs your MR jobs needs in the
> Oozie WF lib/ directory should suffice. Oozie will make sure all those
JARs
> are in the distributed cache.

That doesn't seem to work.  I have this JAR in the WF /lib/ directory
because the Java job that launches the MR job needs it.  And yes, it's in
the distributed cache for the wrapper MR job that Oozie uses to remotely
run the Java job.  The problem is it's not available for the MR job that
the Java job launches.

Thanks, though, for the suggestion.

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Alejandro Abdelnur <tu...@cloudera.com>.

John,

If you are using Oozie, dropping all the JARs your MR jobs needs in the
Oozie WF lib/ directory should suffice. Oozie will make sure all those JARs
are in the distributed cache.

Alejandro

On Thu, May 26, 2011 at 7:45 AM, John Armstrong <jo...@ccri.com>wrote:

> Hi, everybody.
>
> I'm running into some difficulties getting needed libraries to map/reduce
> tasks using the distributed cache.
>
> I'm using Hadoop 0.20.2, which from what I can tell is a hard requirement
> by the client, so more current versions are not really viable options.
>
> The code I've inherited is Java, which sets up and runs the MR job.
> There's currently some nontrivial pre- and post-processing, so it will be a
> large refactoring before I can just run bare MR jobs rather than starting
> them through Java.
>
> Further complicating matters: in practice the Java jobs are launched by
> Oozie, which of course does so by wrapping each one in a MR shell.  The
> upshot is that I don't have any control over which "local" filesystem the
> Java job is run from, though if local files are absolutely needed I can
> make my Java wrappers copy stuff back from HDFS to the Java job's local
> filesystem.
>
> So here's the problem
>
> mappers and/or reducers need class Needed, which is contained in
> needed-1.0.jar, which is in HDFS:
>    hdfs://.../libdir/distributed/needed-1.0.jar
>
> Java program executes:
>    DistributedCache.addFiletoClassPath(new
>
> Path("hdfs://.../libdir/distributed/needed-1.0.jar"),job.getConfiguration());
>
> Inspecting the Job object I find the file has been added to the cache
> files as expected:
>    job.conf.overlay[...] = mapred.cache.files ->
> hdfs://.../libdir/distributed/needed-1.0.jar
>    job.conf.properties[...] = mapred.cache.files ->
> hdfs://.../libdir/distributed/needed-1.0.jar
>
> And the class seems to show up in the internal ClassLoader:
>    job.conf.classLoader.classes[...] = "class my.class.package.Needed"
>
> though this may just be inherited from the ClassLoader of the Java process
> itself (which also uses Needed).
>
> And yet as soon as I get into the mapreduce job itself I start getting:
>
> 2011-05-25 17:22:56,080  INFO JobClient - Task Id :
> attempt_201105251330_0037_r_000043_0, Status : FAILED
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> my.class.package.Needed
>
> Up until this point we've run things by having a directory on each node
> containing all the libraries we'd need, and including that in the Hadoop
> classpath, but we have no such control in the deployment scenario, so we
> have to make our program hand the needed libraries to the map and reduce
> nodes via the distributed cache classpath.
>
> Thanks in advance for any insight or assistance you can offer.
>

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by vishnu krishnan <vg...@gmail.com>.

sorry, i forgot dat, sorry, jst i am moving to a new thread.

thanks

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by John Armstrong <jo...@ccri.com>.

On Thu, 26 May 2011 23:17:43 +0530, vishnu krishnan
<vg...@gmail.com>
wrote:
> thanks,
> 
> 
> if am not using using the map/reduce here, that just i directly sent dat
> data  to the db, what will be the problems?

Look, I hate to be That Guy, especially on my first day on the list but
would you mind moving to your own thread and not hijacking mine?  Thanks.

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by vishnu krishnan <vg...@gmail.com>.

thanks,


if am not using using the map/reduce here, that just i directly sent dat
data  to the db, what will be the problems?

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Robert Evans <ev...@yahoo-inc.com>.

If it is just a GB then you probably don't need Hadoop, unless there is some serious processing involved that hasn't been explained or you already have the data on HDFS, or you happen to have a Hadoop cluster that you have access to and the amount of data is going to grow in size.  Then it could be worth it to write a M/R job to load the data into a DB.

--Bobby

On 5/26/11 12:23 PM, "vishnu krishnan" <vg...@gmail.com> wrote:

thanku,


so just i want to take a GB of data and give to the map/reduce, then store into the database?

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by vishnu krishnan <vg...@gmail.com>.

thanku,


so just i want to take a GB of data and give to the map/reduce, then store
into the database?


-- 
Vishnu R Krishnan
Software Engineer
Create @ Amrita
Amritapuri

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by Robert Evans <ev...@yahoo-inc.com>.

Vishnu,

You have to have a file system that is accessible from all nodes involved to run Hadoop Map Reduce.  This could be NFS if it is a small number of nodes or even the local file system if you are just running one node.  But, with that said  Hadoop is designed to process big data GB, TB, and even PB, so HDFS or some other distributed File System is best if that is what you are doing.  You can use it simply to distribute a computing job to several different machines, but Hadoop Map Reduce still needs a file system as part of the distribution mechanism.

--Bobby Evans

On 5/26/11 10:46 AM, "vishnu krishnan" <vg...@gmail.com> wrote:

am new in map reduce. one think i have to know. can i use the map reduce pgm without any file system?

Re: Problems adding JARs to distributed classpath in Hadoop 0.20.2

Posted by vishnu krishnan <vg...@gmail.com>.

am new in map reduce. one think i have to know. can i use the map reduce pgm
without any file system?