You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andrey Pankov <ap...@iponweb.net> on 2008/03/11 16:30:04 UTC

Hadoop streaming question

Hi all,

I'm still new to Hadoop. I'd like to use Hadoop streaming in order to 
combine mapper as Java class and reducer as C++ program. Currently I'm 
at the beginning of this task and now I have troubles with Java class. 
  It looks something like


package org.company;
  ...
public class TestMapper extends MapReduceBase implements Mapper {
  ...
   public void map(WritableComparable key, Writable value,
     OutputCollector output, Reporter reporter) throws IOException {
  ...


I created jar file with my class and it is accessible via $CLASSPATH. 
I'm running stream job using

$HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data 
-output /out1

Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is

===========================
2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=MAP, sessionId=
2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask: 
numReduceTasks: 1
2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error 
running child
java.lang.RuntimeException: java.lang.RuntimeException: 
java.lang.ClassNotFoundException: org.company.TestMapper
         at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
         at 
org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
         at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
         at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
         at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
         at 
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.company.TestMapper
         at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
         at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
         ... 6 more
Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
         at java.security.AccessController.doPrivileged(Native Method)
         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
         at java.lang.Class.forName0(Native Method)
         at java.lang.Class.forName(Class.java:247)
         at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
         at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
         ... 7 more
===========================

What is interesting for me. I had put into Hadoop streaming 
(StreamJob.java and StreamUtil.java) some debugging println(). Streaming 
can see TestMapper on job configuration stage (StreamJob.setJobConf() 
routine) but cannot later. Next code creates new instance of TestMapper 
and calls toString() defined in TestMapper. It works.

     if (mapCmd_ != null) {
       c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
       if (c != null) {
         System.out.println("#######################");
         try {
             System.out.println(c.newInstance().toString());
         } catch (Exception e) { }
         System.out.println("#######################");
         jobConf_.setMapperClass(c);
       } else {
...
       }
     }


I tried to add jar file with TestMapper using option
  "-file test_mapper.jar" . The result is the same.

Could anybody advice me something? Thanks in advance,

---
Andrey Pankov.


Re: Hadoop streaming question

Posted by Andrey Pankov <ap...@iponweb.net>.
Hi Amareshwari,

I have applied that patch and run my job successfully. I had to specify 
jar file with '-file' option, even if it is available via $CLASSPATH:

$HSTREAMING -mapper org.company.TestMapper -reducer "cat" -input /data 
-output /out4 -file /path/to/test_mapper.jar

Thanks a lot!


Amareshwari Sriramadasu wrote:
> Hi Andrey,
> 
> I think that is classpath problem.
> Can you try using patch at 
> https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have 
> the problem?
> 
> Thanks
> Amareshwari.
> 
> Andrey Pankov wrote:
>> Hi all,
>>
>> I'm still new to Hadoop. I'd like to use Hadoop streaming in order to 
>> combine mapper as Java class and reducer as C++ program. Currently I'm 
>> at the beginning of this task and now I have troubles with Java class. 
>>  It looks something like
>>
>>
>> package org.company;
>>  ...
>> public class TestMapper extends MapReduceBase implements Mapper {
>>  ...
>>   public void map(WritableComparable key, Writable value,
>>     OutputCollector output, Reporter reporter) throws IOException {
>>  ...
>>
>>
>> I created jar file with my class and it is accessible via $CLASSPATH. 
>> I'm running stream job using
>>
>> $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input 
>> /data -output /out1
>>
>> Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The 
>> error is
>>
>> ===========================
>> 2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
>> Initializing JVM Metrics with processName=MAP, sessionId=
>> 2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask: 
>> numReduceTasks: 1
>> 2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: 
>> Error running child
>> java.lang.RuntimeException: java.lang.RuntimeException: 
>> java.lang.ClassNotFoundException: org.company.TestMapper
>>         at 
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
>>         at 
>> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
>>         at 
>> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
>>         at 
>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>>         at 
>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) 
>>
>>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
>>         at 
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
>> Caused by: java.lang.RuntimeException: 
>> java.lang.ClassNotFoundException: org.company.TestMapper
>>         at 
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
>>         at 
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
>>         ... 6 more
>> Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>>         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>>         at java.lang.Class.forName0(Native Method)
>>         at java.lang.Class.forName(Class.java:247)
>>         at 
>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587) 
>>
>>         at 
>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
>>         ... 7 more
>> ===========================
>>
>> What is interesting for me. I had put into Hadoop streaming 
>> (StreamJob.java and StreamUtil.java) some debugging println(). 
>> Streaming can see TestMapper on job configuration stage 
>> (StreamJob.setJobConf() routine) but cannot later. Next code creates 
>> new instance of TestMapper and calls toString() defined in TestMapper. 
>> It works.
>>
>>     if (mapCmd_ != null) {
>>       c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
>>       if (c != null) {
>>         System.out.println("#######################");
>>         try {
>>             System.out.println(c.newInstance().toString());
>>         } catch (Exception e) { }
>>         System.out.println("#######################");
>>         jobConf_.setMapperClass(c);
>>       } else {
>> ...
>>       }
>>     }
>>
>>
>> I tried to add jar file with TestMapper using option
>>  "-file test_mapper.jar" . The result is the same.
>>
>> Could anybody advice me something? Thanks in advance,
>>
>> ---
>> Andrey Pankov.
>>
> 
> 

---
Andrey Pankov.

Re: Hadoop streaming question

Posted by Amareshwari Sriramadasu <am...@yahoo-inc.com>.
Hi Andrey,

I think that is classpath problem.
Can you try using patch at 
https://issues.apache.org/jira/browse/HADOOP-2622 and see you still have 
the problem?

Thanks
Amareshwari.

Andrey Pankov wrote:
> Hi all,
>
> I'm still new to Hadoop. I'd like to use Hadoop streaming in order to 
> combine mapper as Java class and reducer as C++ program. Currently I'm 
> at the beginning of this task and now I have troubles with Java class. 
>  It looks something like
>
>
> package org.company;
>  ...
> public class TestMapper extends MapReduceBase implements Mapper {
>  ...
>   public void map(WritableComparable key, Writable value,
>     OutputCollector output, Reporter reporter) throws IOException {
>  ...
>
>
> I created jar file with my class and it is accessible via $CLASSPATH. 
> I'm running stream job using
>
> $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input 
> /data -output /out1
>
> Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The 
> error is
>
> ===========================
> 2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 1
> 2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: 
> Error running child
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: org.company.TestMapper
>         at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
>         at 
> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
>         at 
> org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
>         at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>         at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) 
>
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
>         at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
> Caused by: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: org.company.TestMapper
>         at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
>         at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
>         ... 6 more
> Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587) 
>
>         at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
>         ... 7 more
> ===========================
>
> What is interesting for me. I had put into Hadoop streaming 
> (StreamJob.java and StreamUtil.java) some debugging println(). 
> Streaming can see TestMapper on job configuration stage 
> (StreamJob.setJobConf() routine) but cannot later. Next code creates 
> new instance of TestMapper and calls toString() defined in TestMapper. 
> It works.
>
>     if (mapCmd_ != null) {
>       c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
>       if (c != null) {
>         System.out.println("#######################");
>         try {
>             System.out.println(c.newInstance().toString());
>         } catch (Exception e) { }
>         System.out.println("#######################");
>         jobConf_.setMapperClass(c);
>       } else {
> ...
>       }
>     }
>
>
> I tried to add jar file with TestMapper using option
>  "-file test_mapper.jar" . The result is the same.
>
> Could anybody advice me something? Thanks in advance,
>
> ---
> Andrey Pankov.
>


Re: Hadoop streaming question

Posted by Peeyush Bishnoi <pe...@yahoo-inc.com>.
Hello Andrey 

Just look at the -cacheDir with streaming , if it can help you out 

http://hadoop.apache.org/core/docs/current/streaming.html#Large+files
+and+archives+in+Hadoop+Streaming


Thankyou ,

---
Peeyush 

On Tue, 2008-03-11 at 17:30 +0200, Andrey Pankov wrote:

> Hi all,
> 
> I'm still new to Hadoop. I'd like to use Hadoop streaming in order to 
> combine mapper as Java class and reducer as C++ program. Currently I'm 
> at the beginning of this task and now I have troubles with Java class. 
>   It looks something like
> 
> 
> package org.company;
>   ...
> public class TestMapper extends MapReduceBase implements Mapper {
>   ...
>    public void map(WritableComparable key, Writable value,
>      OutputCollector output, Reporter reporter) throws IOException {
>   ...
> 
> 
> I created jar file with my class and it is accessible via $CLASSPATH. 
> I'm running stream job using
> 
> $HSTREAMING -mapper org.company.TestMapper -reducer "wc -l" -input /data 
> -output /out1
> 
> Hadoop cannot find TestMapper class. I'm using hadoop-0.16.0. The error is
> 
> ===========================
> 2008-03-07 18:58:07,734 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
> Initializing JVM Metrics with processName=MAP, sessionId=
> 2008-03-07 18:58:07,833 INFO org.apache.hadoop.mapred.MapTask: 
> numReduceTasks: 1
> 2008-03-07 18:58:07,910 WARN org.apache.hadoop.mapred.TaskTracker: Error 
> running child
> java.lang.RuntimeException: java.lang.RuntimeException: 
> java.lang.ClassNotFoundException: org.company.TestMapper
>          at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:639)
>          at 
> org.apache.hadoop.mapred.JobConf.getMapperClass(JobConf.java:728)
>          at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:36)
>          at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>          at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
>          at org.apache.hadoop.mapred.MapTask.run(MapTask.java:204)
>          at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.company.TestMapper
>          at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:607)
>          at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:631)
>          ... 6 more
> Caused by: java.lang.ClassNotFoundException: org.company.TestMapper
>          at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>          at java.security.AccessController.doPrivileged(Native Method)
>          at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>          at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>          at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>          at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>          at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>          at java.lang.Class.forName0(Native Method)
>          at java.lang.Class.forName(Class.java:247)
>          at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:587)
>          at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:605)
>          ... 7 more
> ===========================
> 
> What is interesting for me. I had put into Hadoop streaming 
> (StreamJob.java and StreamUtil.java) some debugging println(). Streaming 
> can see TestMapper on job configuration stage (StreamJob.setJobConf() 
> routine) but cannot later. Next code creates new instance of TestMapper 
> and calls toString() defined in TestMapper. It works.
> 
>      if (mapCmd_ != null) {
>        c = StreamUtil.goodClassOrNull(mapCmd_, defaultPackage);
>        if (c != null) {
>          System.out.println("#######################");
>          try {
>              System.out.println(c.newInstance().toString());
>          } catch (Exception e) { }
>          System.out.println("#######################");
>          jobConf_.setMapperClass(c);
>        } else {
> ...
>        }
>      }
> 
> 
> I tried to add jar file with TestMapper using option
>   "-file test_mapper.jar" . The result is the same.
> 
> Could anybody advice me something? Thanks in advance,
> 
> ---
> Andrey Pankov.
>