You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Brice Arnould <br...@gmail.com> on 2008/04/07 17:42:24 UTC
Looking for "junior bugs"
Hi !
I'm a third year student. I'm going to do an internship in a research lab with
another student. I don't know much details about it, except that our aim will
be to evaluate Hadoop and, if we can, to propose a few patches.
So my question is : Do you know bugs that I might try to fix in order to
grasp the code ? I looked at Jira and seen many bugs that seemed interesting,
but I'm unable to tell if it is reallistic for me to try fighting against
them.
If you have a few bugs to recomend me, I can't promise that I will fix them,
but I do promise that I will try.
Thanks,
Brice
PS: I forgot to mention the worst : I'm french ! Please forgive me my
english. If I've been impolite be sure that it's not by will.
Re: Java inputformat for pipes job
Posted by Rahul Sood <rs...@yahoo-inc.com>.
I'm invoking hadoop with pipes command:
hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
I tried the -file and -cacheFile options but when either of these is
passed to hadoop pipes, the command just exits with a usage message.
There must be a way to specify a jar for a job implemented in C++ with
the hadoop Pipes API. The documentation states that record readers and
writers for Pipes jobs can be implemented in java. I looked at the
source code of org.apache.hadoop.mapred.pipes.Submitter and it's doing
the following:
/**
* The main entry point and job submitter. It may either be used as
* a command line-based or API-based method to launch Pipes jobs.
*/
public class Submitter {
/**
* Submit a pipes job based on the command line arguments.
* @param args
*/
public static void main(String[] args) throws Exception {
CommandLineParser cli = new CommandLineParser();
//...
if (results.hasOption("-inputformat")) {
setIsJavaRecordReader(conf, true);
conf.setInputFormat(getClass(results, "-inputformat", conf,
InputFormat.class));
}
}
}
It is loading the input format class based on the value of the
-inputformat cmdline parameter. That means there should be some way to
package the input format class along with the program binary and other
supporting files.
-Rahul Sood
rsood@yahoo-inc.com
> You should use the -pipes option in the command.
> For the input format, you can pack it into the hadoop core class jar file,
> or put it into the cache file.
>
> 2008/4/8, Rahul Sood <rs...@yahoo-inc.com>:
> >
> > Hi,
> >
> > I implemented a customized input format in Java for a Map Reduce job.
> > The mapper and reducer classes are implemented in C++, using the Hadoop
> > Pipes API.
> >
> > The package documentation for org.apache.hadoop.mapred.pipes states that
> > "The job may consist of any combination of Java and C++ RecordReaders,
> > Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
> >
> > I packaged the input format class in a jar file and ran the job
> > invocation command:
> >
> > hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
> > conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
> >
> > It keeps failing with error ClassNotFoundException
> > Although I've specified the jar file name with the -jar parameter, the
> > input format class still cannot be located. Is there any other means to
> > specify the input format class, or the job jar file, for a Pipes job ?
> >
> > Stack trace:
> >
> > Exception in thread "main" java.lang.ClassNotFoundException:
> > mytest.PriceInputFormat
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
> > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:247)
> > at
> >
> > org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
> > at
> > org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
> > at
> > org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
> >
> > Thanks,
> >
> >
> > Rahul Sood
> > rsood@yahoo-inc.com
> >
> >
> >
Re: Java inputformat for pipes job
Posted by "11 Nov." <no...@gmail.com>.
You should use the -pipes option in the command.
For the input format, you can pack it into the hadoop core class jar file,
or put it into the cache file.
2008/4/8, Rahul Sood <rs...@yahoo-inc.com>:
>
> Hi,
>
> I implemented a customized input format in Java for a Map Reduce job.
> The mapper and reducer classes are implemented in C++, using the Hadoop
> Pipes API.
>
> The package documentation for org.apache.hadoop.mapred.pipes states that
> "The job may consist of any combination of Java and C++ RecordReaders,
> Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
>
> I packaged the input format class in a jar file and ran the job
> invocation command:
>
> hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
> conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
>
> It keeps failing with error ClassNotFoundException
> Although I've specified the jar file name with the -jar parameter, the
> input format class still cannot be located. Is there any other means to
> specify the input format class, or the job jar file, for a Pipes job ?
>
> Stack trace:
>
> Exception in thread "main" java.lang.ClassNotFoundException:
> mytest.PriceInputFormat
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
> at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:247)
> at
>
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
> at
> org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
> at
> org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
>
> Thanks,
>
>
> Rahul Sood
> rsood@yahoo-inc.com
>
>
>
Java inputformat for pipes job
Posted by Rahul Sood <rs...@yahoo-inc.com>.
Hi,
I implemented a customized input format in Java for a Map Reduce job.
The mapper and reducer classes are implemented in C++, using the Hadoop
Pipes API.
The package documentation for org.apache.hadoop.mapred.pipes states that
"The job may consist of any combination of Java and C++ RecordReaders,
Mappers, Paritioner, Combiner, Reducer, and RecordWriter"
I packaged the input format class in a jar file and ran the job
invocation command:
hadoop pipes -jar mytest.jar -inputformat mytest.PriceInputFormat -conf
conf/mytest.xml -input mgr/in -output mgr/out -program mgr/bin/TestMgr
It keeps failing with error ClassNotFoundException
Although I've specified the jar file name with the -jar parameter, the
input format class still cannot be located. Is there any other means to
specify the input format class, or the job jar file, for a Pipes job ?
Stack trace:
Exception in thread "main" java.lang.ClassNotFoundException:
mytest.PriceInputFormat
at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:276)
at java.lang.ClassLoader.loadClass(ClassLoader.java:251)
at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:524)
at
org.apache.hadoop.mapred.pipes.Submitter.getClass(Submitter.java:309)
at
org.apache.hadoop.mapred.pipes.Submitter.main(Submitter.java:357)
Thanks,
Rahul Sood
rsood@yahoo-inc.com
Re: Looking for "junior bugs"
Posted by Arun C Murthy <ar...@yahoo-inc.com>.
Brice,
Welcome to Hadoop!
Along with the bugs you browsed here are some suggestions we
maintain for interested folks to get the feet wet:
http://wiki.apache.org/hadoop/ProjectSuggestions
Other than that please feel free to pick up any unassigned bugs
(maybe start with 'Minor' ones) or pick up an assigned bug which
hasn't been touched in a while (please drop a comment on the jira to
let the current assignee know... *smile*). You can browse the jira by
components to pick specific areas which interest you (hdfs, map-
reduce, pipes, streaming etc.).
Good luck.
Arun
On Apr 7, 2008, at 8:42 AM, Brice Arnould wrote:
> Hi !
> I'm a third year student. I'm going to do an internship in a
> research lab with
> another student. I don't know much details about it, except that
> our aim will
> be to evaluate Hadoop and, if we can, to propose a few patches.
>
> So my question is : Do you know bugs that I might try to fix in
> order to
> grasp the code ? I looked at Jira and seen many bugs that seemed
> interesting,
> but I'm unable to tell if it is reallistic for me to try fighting
> against
> them.
> If you have a few bugs to recomend me, I can't promise that I will
> fix them,
> but I do promise that I will try.
>
> Thanks,
> Brice
>
> PS: I forgot to mention the worst : I'm french ! Please forgive me my
> english. If I've been impolite be sure that it's not by will.