You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Aaron Kimball <aa...@cloudera.com> on 2009/03/02 07:42:48 UTC

Example of deploying jars through DistributedCache?

Hi all,

I'm stumped as to how to use the distributed cache's classpath feature. I
have a library of Java classes I'd like to distribute to jobs and use in my
mapper; I figured the DCache's addFileToClassPath() method was the correct
means, given the example at
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.


I've boiled it down to the following non-working example:

in TestDriver.java:


  private void runJob() throws IOException {
    JobConf conf = new JobConf(getConf(), TestDriver.class);

    // do standard job configuration.
    FileInputFormat.addInputPath(conf, new Path("input"));
    FileOutputFormat.setOutputPath(conf, new Path("output"));

    conf.setMapperClass(TestMapper.class);
    conf.setNumReduceTasks(0);

    // load aaronTest2.jar into the dcache; this contains the class
ValueProvider
    FileSystem fs = FileSystem.get(conf);
    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
Path("tmp/aaronTest2.jar"));
    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
conf);

    // run the job.
    JobClient.runJob(conf);
  }


.... and then in TestMapper:

  public void map(LongWritable key, Text value,
OutputCollector<LongWritable, Text> output,
      Reporter reporter) throws IOException {

    try {
      ValueProvider vp = (ValueProvider)
Class.forName("ValueProvider").newInstance();
      Text val = vp.getValue();
      output.collect(new LongWritable(1), val);
    } catch (ClassNotFoundException e) {
      throw new IOException("not found: " + e.toString()); // newInstance()
throws to here.
    } catch (Exception e) {
      throw new IOException("Exception:" + e.toString());
    }
  }


The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify
that this code works if I put ValueProvider into the main jar I deploy. I
can verify that aaronTest2.jar makes it into the
${mapred.local.dir}/taskTracker/archive/

But when run with ValueProvider in aaronTest2.jar, the job fails with:

$ bin/hadoop jar aaronTest1.jar TestDriver
09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
: 10
09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
: 10
09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005
09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
attempt_200903012210_0005_m_000000_0, Status : FAILED
java.io.IOException: not found: java.lang.ClassNotFoundException:
ValueProvider
    at TestMapper.map(Unknown Source)
    at TestMapper.map(Unknown Source)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
    at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)


Do I need to do something else (maybe in Mapper.configure()?) to actually
classload the jar? The documentation makes me believe it should already be
in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3.

Thanks,
- Aaron

Re: Example of deploying jars through DistributedCache?

Posted by Aaron Kimball <aa...@cloudera.com>.
Ooh. The other DCache-based operations assume that you're dcaching files
already resident in HDFS. I guess this assumes that the filenames are on the
local filesystem.

- Aaron

On Wed, Apr 8, 2009 at 8:32 AM, Brian MacKay <Br...@medecision.com>wrote:

>
> I use addArchiveToClassPath, and it works for me.
>
> DistributedCache.addArchiveToClassPath(new Path(path), conf);
>
> I was curious about this block of code.  Why are you coping to tmp?
>
> >    FileSystem fs = FileSystem.get(conf);
> >    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> > Path("tmp/aaronTest2.jar"));
>
> -----Original Message-----
> From: Tom White [mailto:tom@cloudera.com]
> Sent: Wednesday, April 08, 2009 9:36 AM
> To: core-user@hadoop.apache.org
> Subject: Re: Example of deploying jars through DistributedCache?
>
> Does it work if you use addArchiveToClassPath()?
>
> Also, it may be more convenient to use GenericOptionsParser's -libjars
> option.
>
> Tom
>
> On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <aa...@cloudera.com> wrote:
> > Hi all,
> >
> > I'm stumped as to how to use the distributed cache's classpath feature. I
> > have a library of Java classes I'd like to distribute to jobs and use in
> my
> > mapper; I figured the DCache's addFileToClassPath() method was the
> correct
> > means, given the example at
> >
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html
> .
> >
> >
> > I've boiled it down to the following non-working example:
> >
> > in TestDriver.java:
> >
> >
> >  private void runJob() throws IOException {
> >    JobConf conf = new JobConf(getConf(), TestDriver.class);
> >
> >    // do standard job configuration.
> >    FileInputFormat.addInputPath(conf, new Path("input"));
> >    FileOutputFormat.setOutputPath(conf, new Path("output"));
> >
> >    conf.setMapperClass(TestMapper.class);
> >    conf.setNumReduceTasks(0);
> >
> >    // load aaronTest2.jar into the dcache; this contains the class
> > ValueProvider
> >    FileSystem fs = FileSystem.get(conf);
> >    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> > Path("tmp/aaronTest2.jar"));
> >    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
> > conf);
> >
> >    // run the job.
> >    JobClient.runJob(conf);
> >  }
> >
> >
> > .... and then in TestMapper:
> >
> >  public void map(LongWritable key, Text value,
> > OutputCollector<LongWritable, Text> output,
> >      Reporter reporter) throws IOException {
> >
> >    try {
> >      ValueProvider vp = (ValueProvider)
> > Class.forName("ValueProvider").newInstance();
> >      Text val = vp.getValue();
> >      output.collect(new LongWritable(1), val);
> >    } catch (ClassNotFoundException e) {
> >      throw new IOException("not found: " + e.toString()); //
> newInstance()
> > throws to here.
> >    } catch (Exception e) {
> >      throw new IOException("Exception:" + e.toString());
> >    }
> >  }
> >
> >
> > The class "ValueProvider" is to be loaded from aaronTest2.jar. I can
> verify
> > that this code works if I put ValueProvider into the main jar I deploy. I
> > can verify that aaronTest2.jar makes it into the
> > ${mapred.local.dir}/taskTracker/archive/
> >
> > But when run with ValueProvider in aaronTest2.jar, the job fails with:
> >
> > $ bin/hadoop jar aaronTest1.jar TestDriver
> > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 10
> > 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to
> process
> > : 10
> > 09/03/01 22:36:04 INFO mapred.JobClient: Running job:
> job_200903012210_0005
> > 09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
> > 09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
> > attempt_200903012210_0005_m_000000_0, Status : FAILED
> > java.io.IOException: not found: java.lang.ClassNotFoundException:
> > ValueProvider
> >    at TestMapper.map(Unknown Source)
> >    at TestMapper.map(Unknown Source)
> >    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> >    at
> > org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
> >
> >
> > Do I need to do something else (maybe in Mapper.configure()?) to actually
> > classload the jar? The documentation makes me believe it should already
> be
> > in the classpath by doing only what I've done above. I'm on Hadoop
> 0.18.3.
> >
> > Thanks,
> > - Aaron
> >
>
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
>
> The information transmitted is intended only for the person or entity to
> which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipient is prohibited. If you received
> this message in error, please contact the sender and delete the material
> from any computer.
>
>
>

RE: Example of deploying jars through DistributedCache?

Posted by Brian MacKay <Br...@MEDecision.com>.
I use addArchiveToClassPath, and it works for me.

DistributedCache.addArchiveToClassPath(new Path(path), conf);

I was curious about this block of code.  Why are you coping to tmp?

>    FileSystem fs = FileSystem.get(conf);
>    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> Path("tmp/aaronTest2.jar"));

-----Original Message-----
From: Tom White [mailto:tom@cloudera.com] 
Sent: Wednesday, April 08, 2009 9:36 AM
To: core-user@hadoop.apache.org
Subject: Re: Example of deploying jars through DistributedCache?

Does it work if you use addArchiveToClassPath()?

Also, it may be more convenient to use GenericOptionsParser's -libjars option.

Tom

On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <aa...@cloudera.com> wrote:
> Hi all,
>
> I'm stumped as to how to use the distributed cache's classpath feature. I
> have a library of Java classes I'd like to distribute to jobs and use in my
> mapper; I figured the DCache's addFileToClassPath() method was the correct
> means, given the example at
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.
>
>
> I've boiled it down to the following non-working example:
>
> in TestDriver.java:
>
>
>  private void runJob() throws IOException {
>    JobConf conf = new JobConf(getConf(), TestDriver.class);
>
>    // do standard job configuration.
>    FileInputFormat.addInputPath(conf, new Path("input"));
>    FileOutputFormat.setOutputPath(conf, new Path("output"));
>
>    conf.setMapperClass(TestMapper.class);
>    conf.setNumReduceTasks(0);
>
>    // load aaronTest2.jar into the dcache; this contains the class
> ValueProvider
>    FileSystem fs = FileSystem.get(conf);
>    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> Path("tmp/aaronTest2.jar"));
>    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
> conf);
>
>    // run the job.
>    JobClient.runJob(conf);
>  }
>
>
> .... and then in TestMapper:
>
>  public void map(LongWritable key, Text value,
> OutputCollector<LongWritable, Text> output,
>      Reporter reporter) throws IOException {
>
>    try {
>      ValueProvider vp = (ValueProvider)
> Class.forName("ValueProvider").newInstance();
>      Text val = vp.getValue();
>      output.collect(new LongWritable(1), val);
>    } catch (ClassNotFoundException e) {
>      throw new IOException("not found: " + e.toString()); // newInstance()
> throws to here.
>    } catch (Exception e) {
>      throw new IOException("Exception:" + e.toString());
>    }
>  }
>
>
> The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify
> that this code works if I put ValueProvider into the main jar I deploy. I
> can verify that aaronTest2.jar makes it into the
> ${mapred.local.dir}/taskTracker/archive/
>
> But when run with ValueProvider in aaronTest2.jar, the job fails with:
>
> $ bin/hadoop jar aaronTest1.jar TestDriver
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005
> 09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
> 09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
> attempt_200903012210_0005_m_000000_0, Status : FAILED
> java.io.IOException: not found: java.lang.ClassNotFoundException:
> ValueProvider
>    at TestMapper.map(Unknown Source)
>    at TestMapper.map(Unknown Source)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>    at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>
>
> Do I need to do something else (maybe in Mapper.configure()?) to actually
> classload the jar? The documentation makes me believe it should already be
> in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3.
>
> Thanks,
> - Aaron
>

_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

The information transmitted is intended only for the person or entity to 
which it is addressed and may contain confidential and/or privileged 
material. Any review, retransmission, dissemination or other use of, or 
taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received 
this message in error, please contact the sender and delete the material 
from any computer.



Re: Example of deploying jars through DistributedCache?

Posted by Tom White <to...@cloudera.com>.
Does it work if you use addArchiveToClassPath()?

Also, it may be more convenient to use GenericOptionsParser's -libjars option.

Tom

On Mon, Mar 2, 2009 at 7:42 AM, Aaron Kimball <aa...@cloudera.com> wrote:
> Hi all,
>
> I'm stumped as to how to use the distributed cache's classpath feature. I
> have a library of Java classes I'd like to distribute to jobs and use in my
> mapper; I figured the DCache's addFileToClassPath() method was the correct
> means, given the example at
> http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/filecache/DistributedCache.html.
>
>
> I've boiled it down to the following non-working example:
>
> in TestDriver.java:
>
>
>  private void runJob() throws IOException {
>    JobConf conf = new JobConf(getConf(), TestDriver.class);
>
>    // do standard job configuration.
>    FileInputFormat.addInputPath(conf, new Path("input"));
>    FileOutputFormat.setOutputPath(conf, new Path("output"));
>
>    conf.setMapperClass(TestMapper.class);
>    conf.setNumReduceTasks(0);
>
>    // load aaronTest2.jar into the dcache; this contains the class
> ValueProvider
>    FileSystem fs = FileSystem.get(conf);
>    fs.copyFromLocalFile(new Path("aaronTest2.jar"), new
> Path("tmp/aaronTest2.jar"));
>    DistributedCache.addFileToClassPath(new Path("tmp/aaronTest2.jar"),
> conf);
>
>    // run the job.
>    JobClient.runJob(conf);
>  }
>
>
> .... and then in TestMapper:
>
>  public void map(LongWritable key, Text value,
> OutputCollector<LongWritable, Text> output,
>      Reporter reporter) throws IOException {
>
>    try {
>      ValueProvider vp = (ValueProvider)
> Class.forName("ValueProvider").newInstance();
>      Text val = vp.getValue();
>      output.collect(new LongWritable(1), val);
>    } catch (ClassNotFoundException e) {
>      throw new IOException("not found: " + e.toString()); // newInstance()
> throws to here.
>    } catch (Exception e) {
>      throw new IOException("Exception:" + e.toString());
>    }
>  }
>
>
> The class "ValueProvider" is to be loaded from aaronTest2.jar. I can verify
> that this code works if I put ValueProvider into the main jar I deploy. I
> can verify that aaronTest2.jar makes it into the
> ${mapred.local.dir}/taskTracker/archive/
>
> But when run with ValueProvider in aaronTest2.jar, the job fails with:
>
> $ bin/hadoop jar aaronTest1.jar TestDriver
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:03 INFO mapred.FileInputFormat: Total input paths to process
> : 10
> 09/03/01 22:36:04 INFO mapred.JobClient: Running job: job_200903012210_0005
> 09/03/01 22:36:05 INFO mapred.JobClient:  map 0% reduce 0%
> 09/03/01 22:36:14 INFO mapred.JobClient: Task Id :
> attempt_200903012210_0005_m_000000_0, Status : FAILED
> java.io.IOException: not found: java.lang.ClassNotFoundException:
> ValueProvider
>    at TestMapper.map(Unknown Source)
>    at TestMapper.map(Unknown Source)
>    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
>    at
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
>
>
> Do I need to do something else (maybe in Mapper.configure()?) to actually
> classload the jar? The documentation makes me believe it should already be
> in the classpath by doing only what I've done above. I'm on Hadoop 0.18.3.
>
> Thanks,
> - Aaron
>