You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brice Arnould <br...@gmail.com> on 2008/04/07 17:42:24 UTC

Looking for "junior bugs"

Hi !
I'm a third year student. I'm going to do an internship in a research lab with 
another student. I don't know much details about it, except that our aim will 
be to evaluate Hadoop and, if we can, to propose a few patches.

So my question is : Do you know  bugs that I might try to fix in order to  
grasp the code ? I looked at Jira and seen many bugs that seemed interesting, 
but I'm unable to tell if it is reallistic for me to try fighting against 
them.
If you have a few bugs to recomend me, I can't promise that I will fix them, 
but I do promise that I will try.

Thanks,
Brice

PS: I forgot to mention the worst : I'm french ! Please forgive me my 
english. If I've been impolite be sure that it's not by will.

Re: Java inputformat for pipes job

Posted by Rahul Sood <rs...@yahoo-inc.com>.
I'm invoking hadoop with pipes command:

hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr

I tried the -file and -cacheFile options but when either of these is
passed to hadoop pipes, the command just exits with a usage message.

There must be a way to specify a jar for a job implemented in C++ with
the hadoop Pipes API. The documentation states that record readers and
writers for Pipes jobs can be implemented in java. I looked at the
source code of org.apache.hadoop.mapred.pipes.Submitter and it's doing
the following:

/**
 * The main entry point and job submitter. It may either be used as
 * a command line-based or API-based method to launch Pipes jobs.
 */
public class Submitter {

   /**
   * Submit a pipes job based on the command line arguments.
   * @param args
   */
  public static void main(String[] args) throws Exception {
    CommandLineParser cli = new CommandLineParser();
    //...
      if (results.hasOption("-inputformat")) {
        setIsJavaRecordReader(conf, true);
        conf.setInputFormat(getClass(results, "-inputformat", conf,
                                     InputFormat.class));
      }
  }
}

 It is loading the input format class based on the value of the
-inputformat cmdline parameter. That means there should be some way to
package the input format class along with the program binary and other
supporting files.

-Rahul Sood
rsood@yahoo-inc.com

> You should use the -pipes option in the command.
> For the input format, you can pack it into the hadoop core class jar file,
> or put it into the cache file.
> 
> 2008/4/8, Rahul Sood <rs...@yahoo-inc.com>:
> >
> > Hi,
> >
> > I implemented a customized input format in Java for a Map Reduce job.
> > The mapper and reducer classes are implemented in C++, using the Hadoop
> > Pipes API.
> >
> > The package documentation for org.apache.hadoop.mapred.pipes states that
> > "The job may consist of any combination of Java and C++ RecordReaders,
> > Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
> >
> > I packaged the input format class in a jar file and ran the job
> > invocation command:
> >
> > hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
> > conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
> >
> > It keeps failing with error ClassNotFoundException
> > Although I've specified the jar file name with the -jar parameter, the
> > input format class still cannot be located. Is there any other means to
> > specify the input format class, or the job jar file, for a Pipes job ?
> >
> > Stack trace:
> >
> > Exception in thread "main" java.lang.ClassNotFoundException:
> > mytest.PriceInputFormat
> >         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
> >         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> >         at java.lang.Class.forName0(Native Method)
> >         at java.lang.Class.forName(Class.java:247)
> >         at
> >
> > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
> >         at
> > org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
> >         at
> > org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
> >
> > Thanks,
> >
> >
> > Rahul Sood
> > rsood@yahoo-inc.com
> >
> >
> >


Re: Java inputformat for pipes job

Posted by "11 Nov." <no...@gmail.com>.
You should use the -pipes option in the command.
For the input format, you can pack it into the hadoop core class jar file,
or put it into the cache file.

2008/4/8, Rahul Sood <rs...@yahoo-inc.com>:
>
> Hi,
>
> I implemented a customized input format in Java for a Map Reduce job.
> The mapper and reducer classes are implemented in C++, using the Hadoop
> Pipes API.
>
> The package documentation for org.apache.hadoop.mapred.pipes states that
> "The job may consist of any combination of Java and C++ RecordReaders,
> Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
>
> I packaged the input format class in a jar file and ran the job
> invocation command:
>
> hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
> conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
>
> It keeps failing with error ClassNotFoundException
> Although I've specified the jar file name with the -jar parameter, the
> input format class still cannot be located. Is there any other means to
> specify the input format class, or the job jar file, for a Pipes job ?
>
> Stack trace:
>
> Exception in thread "main" java.lang.ClassNotFoundException:
> mytest.PriceInputFormat
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>         at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
>         at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at
>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
>         at
> org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
>         at
> org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
>
> Thanks,
>
>
> Rahul Sood
> rsood@yahoo-inc.com
>
>
>

Java inputformat for pipes job

Posted by Rahul Sood <rs...@yahoo-inc.com>.
Hi,

I implemented a customized input format in Java for a Map Reduce job.
The mapper and reducer classes are implemented in C++, using the Hadoop
Pipes API. 

The package documentation for org.apache.hadoop.mapred.pipes states that
"The job may consist of any combination of Java and C++ RecordReaders,
Mappers, Paritioner, Combiner, Reducer, and RecordWriter"

I packaged the input format class in a jar file and ran the job
invocation command:

hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr

It keeps failing with error ClassNotFoundException
Although I've specified the jar file name with the -jar parameter, the
input format class still cannot be located. Is there any other means to
specify the input format class, or the job jar file, for a Pipes job ?

Stack trace:

Exception in thread "main" java.lang.ClassNotFoundException:
mytest.PriceInputFormat
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
        at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:247)
        at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
        at
org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
        at
org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)

Thanks,

Rahul Sood
rsood@yahoo-inc.com



Re: Looking for "junior bugs"

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
Brice,

  Welcome to Hadoop!

  Along with the bugs you browsed here are some suggestions we  
maintain for interested folks to get the feet wet:
  http://wiki.apache.org/hadoop/ProjectSuggestions

  Other than that please feel free to pick up any unassigned bugs  
(maybe start with 'Minor' ones)  or pick up an assigned bug which  
hasn't been touched in a while (please drop a comment on the jira to  
let the current assignee know... *smile*). You can browse the jira by  
components to pick specific areas which interest you (hdfs, map- 
reduce, pipes, streaming etc.).

  Good luck.

Arun

On Apr 7, 2008, at 8:42 AM, Brice Arnould wrote:

> Hi !
> I'm a third year student. I'm going to do an internship in a  
> research lab with
> another student. I don't know much details about it, except that  
> our aim will
> be to evaluate Hadoop and, if we can, to propose a few patches.
>
> So my question is : Do you know  bugs that I might try to fix in  
> order to
> grasp the code ? I looked at Jira and seen many bugs that seemed  
> interesting,
> but I'm unable to tell if it is reallistic for me to try fighting  
> against
> them.
> If you have a few bugs to recomend me, I can't promise that I will  
> fix them,
> but I do promise that I will try.
>
> Thanks,
> Brice
>
> PS: I forgot to mention the worst : I'm french ! Please forgive me my
> english. If I've been impolite be sure that it's not by will.