You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jamal x <jm...@gmail.com> on 2011/10/28 19:17:45 UTC

Externally submitted MapReduce Job Fails at Startup Help Please...

Hi,

I wrote a small test program to perform a simple database extraction of
information from a simple table on a remote cluster.  However, it fails to
execute successfully when I run from eclipse it with the following
exception:

12:36:08,993  WARN main mapred.JobClient:659 - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
12:36:09,567  WARN main mapred.JobClient:776 - No job jar file set.  User
classes may not be found. See JobConf(Class) or JobConf#setJar(String).
java.lang.RuntimeException: Error in configuring object
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
    at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
    at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
    at
org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)

    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)

    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
    ... 11 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
com.mysql.jdbc.Driver
    at
org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)

    ... 16 more
Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:169)
    at
org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)

    at
org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)

    ... 16 more


I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on all
servers in the cluster, and even tried using the
DistributedCache.addArchiveToClassPath method, with no success.  Can someone
please help me figure out what is going on here?

Here is my simple main which performs the remote submission of the job:
public int run(String[] arg0) throws Exception {

        System.out.println("Setting up job configuration....");
        Configuration conf = new Configuration();
        conf.set("mapred.job.tracker", "jobtracker.hostname:8021");
        conf.set("fs.default.name", "hdfs://namenode.hostname:9000");
        conf.set("keep.failed.task.files", "true");
        conf.set("mapred.child.java.opts", "-Xmx1024m");

        FileSystem fs = FileSystem.get(conf);
        fs.delete(new Path("/myfolder/dump_output/"), true);
        fs.mkdirs(new Path("/myfolder/libs/"));

        fs.copyFromLocalFile(
                new Path(

"C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"),

                new
Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"));


          fs.copyFromLocalFile( new Path(

"C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar"

          ), new Path("/myfolder/libs/mysql-connector-java-5.1.17.jar"));

        DistributedCache.addArchiveToClassPath(new Path(
                "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"), conf,
fs);

        DistributedCache.addArchiveToClassPath(new Path(
                "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf,
fs);

        JobConf job = new JobConf(conf);

        job.setJobName("Exporting Job");
        job.setJarByClass(MyMapper.class);
        job.setMapperClass(MyMapper.class);
        Class claz = Class.forName("com.mysql.jdbc.Driver");
        if (claz == null) {
            throw new RuntimeException("wow...");
        }

        Configuration.dumpConfiguration(conf, new PrintWriter(System.out));

        DBConfiguration
                .configureDB(
                        job,
                        "com.mysql.jdbc.Driver",

"jdbc:mysql://mydbserver:3306/test?autoReconnect=true",
                        "user", "password");

        String[] fields = { "employee_id", "name" };
        DBInputFormat.setInput(job, MyRecord.class, "employees", null,
                "employee_id", fields);

        FileOutputFormat.setOutputPath(job, new Path(
                "/myfolder/dump_output/"));

        System.out.println("Submitting job....");

        JobClient.runJob(job);

        System.out.println("job info: " + job.getNumMapTasks());

        return 0;
    }

    public static void main(String[] args) throws Exception {
        int exitCode = ToolRunner.run(new SimpleDriver(), args);
        System.out.println("Completed.");
        System.exit(exitCode);
    }


I'm using the hadoop-core version 0.20.205.0 maven dependency to build and
run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar has my
classes, and it's dependencies included under the /lib folder.

Any help would be greatly appreciated.

Thanks

Re: Externally submitted MapReduce Job Fails at Startup Help Please...

Posted by Jamal B <jm...@gmail.com>.
So I finally figured out what was going on.  To make a long story short, my
jar's lib folder contained transitive dependencies from dependencies I had
left in my pom.xml (spring, slf4j, etc..) ...typicall copy and paste
problem on my part... .

I found this by giving up on the remote submission, and just trying to use
the command line like previously suggested first to at least see if my
simple job would run.  Turns out, I had a conflcting slf4j jar causing my
submission to fail with a NoSuchMethod exception.  A couple of searches,
and I came accross this email.

http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201101.mbox/%3C4D3E3C87.7090108@gmail.com%3E

I replaced the version of slf4j in hadoop, restarted my test cluster, and
things worked like a charm (both using the command line, & remote
submission).

Learned alot :), and thanks for all the help.

On Sat, Oct 29, 2011 at 4:00 PM, Steve Lewis <lo...@gmail.com> wrote:

> Did you build a jar file for your job and did you put mysql-cinnector,jar
> in
> its lib directory???
> I have had this work for me
>
> On Fri, Oct 28, 2011 at 12:56 PM, Jamal x <jm...@gmail.com> wrote:
>
> > Thanks for the response.
> >
> > I need to submit this job programatically, instead of using the command
> > line.  Shouldn't the distributedCache class method handle the classpath
> > setup for the job?  If not, is there some other setup missing from my
> > driver
> > class?
> >
> > I also, looked into sqoop, but wanted to get this working for a
> particular
> > case which I think isn't a good fit fot it,but I may be wrong.  Plus,
> > wanted
> > to use this usecase for getting more experience with creating and running
> > jobs remotely.
> >
> > Thanks
> > On Oct 28, 2011 1:38 PM, "Brock Noland" <br...@cloudera.com> wrote:
> >
> > > Hi,
> > >
> > > I always find that using the -libjars command line option is the
> > > easiest way to push jars to the cluster.
> > >
> > > Also, you may want to checkout Apache Sqoop:
> > > http://incubator.apache.org/projects/sqoop.html
> > >
> > > Brock
> > >
> > > On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <jm...@gmail.com> wrote:
> > > > Hi,
> > > >
> > > > I wrote a small test program to perform a simple database extraction
> of
> > > > information from a simple table on a remote cluster.  However, it
> fails
> > > to
> > > > execute successfully when I run from eclipse it with the following
> > > > exception:
> > > >
> > > > 12:36:08,993  WARN main mapred.JobClient:659 - Use
> GenericOptionsParser
> > > for
> > > > parsing the arguments. Applications should implement Tool for the
> same.
> > > > 12:36:09,567  WARN main mapred.JobClient:776 - No job jar file set.
> >  User
> > > > classes may not be found. See JobConf(Class) or
> JobConf#setJar(String).
> > > > java.lang.RuntimeException: Error in configuring object
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> > > >    at
> > > >
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > > >
> > > >    at
> org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
> > > >
> > > >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
> > > >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > > >    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > >    at java.security.AccessController.doPrivileged(Native Method)
> > > >    at javax.security.auth.Subject.doAs(Subject.java:396)
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > > >
> > > >    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > > Caused by: java.lang.reflect.InvocationTargetException
> > > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > > >    at
> > > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > > >
> > > >    at
> > > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > > >
> > > >    at java.lang.reflect.Method.invoke(Method.java:597)
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> > > >    ... 11 more
> > > > Caused by: java.lang.RuntimeException:
> > java.lang.ClassNotFoundException:
> > > > com.mysql.jdbc.Driver
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)
> > > >
> > > >    ... 16 more
> > > > Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
> > > >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > >    at java.security.AccessController.doPrivileged(Native Method)
> > > >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > > >    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > > >    at java.lang.Class.forName0(Native Method)
> > > >    at java.lang.Class.forName(Class.java:169)
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)
> > > >
> > > >    at
> > > >
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)
> > > >
> > > >    ... 16 more
> > > >
> > > >
> > > > I do have the mysql-connector jar under the $HADOOP_HOME/lib folder
> on
> > > all
> > > > servers in the cluster, and even tried using the
> > > > DistributedCache.addArchiveToClassPath method, with no success.  Can
> > > someone
> > > > please help me figure out what is going on here?
> > > >
> > > > Here is my simple main which performs the remote submission of the
> job:
> > > > public int run(String[] arg0) throws Exception {
> > > >
> > > >        System.out.println("Setting up job configuration....");
> > > >        Configuration conf = new Configuration();
> > > >        conf.set("mapred.job.tracker", "jobtracker.hostname:8021");
> > > >        conf.set("fs.default.name", "hdfs://namenode.hostname:9000");
> > > >        conf.set("keep.failed.task.files", "true");
> > > >        conf.set("mapred.child.java.opts", "-Xmx1024m");
> > > >
> > > >        FileSystem fs = FileSystem.get(conf);
> > > >        fs.delete(new Path("/myfolder/dump_output/"), true);
> > > >        fs.mkdirs(new Path("/myfolder/libs/"));
> > > >
> > > >        fs.copyFromLocalFile(
> > > >                new Path(
> > > >
> > > >
> > >
> >
> "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"),
> > > >
> > > >                new
> > > > Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"));
> > > >
> > > >
> > > >          fs.copyFromLocalFile( new Path(
> > > >
> > > >
> > >
> >
> "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar"
> > > >
> > > >          ), new
> > Path("/myfolder/libs/mysql-connector-java-5.1.17.jar"));
> > > >
> > > >        DistributedCache.addArchiveToClassPath(new Path(
> > > >                "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"),
> > conf,
> > > > fs);
> > > >
> > > >        DistributedCache.addArchiveToClassPath(new Path(
> > > >                "/myfolder/libs/mysql-connector-java-5.1.17.jar"),
> conf,
> > > > fs);
> > > >
> > > >        JobConf job = new JobConf(conf);
> > > >
> > > >        job.setJobName("Exporting Job");
> > > >        job.setJarByClass(MyMapper.class);
> > > >        job.setMapperClass(MyMapper.class);
> > > >        Class claz = Class.forName("com.mysql.jdbc.Driver");
> > > >        if (claz == null) {
> > > >            throw new RuntimeException("wow...");
> > > >        }
> > > >
> > > >        Configuration.dumpConfiguration(conf, new
> > > PrintWriter(System.out));
> > > >
> > > >        DBConfiguration
> > > >                .configureDB(
> > > >                        job,
> > > >                        "com.mysql.jdbc.Driver",
> > > >
> > > > "jdbc:mysql://mydbserver:3306/test?autoReconnect=true",
> > > >                        "user", "password");
> > > >
> > > >        String[] fields = { "employee_id", "name" };
> > > >        DBInputFormat.setInput(job, MyRecord.class, "employees", null,
> > > >                "employee_id", fields);
> > > >
> > > >        FileOutputFormat.setOutputPath(job, new Path(
> > > >                "/myfolder/dump_output/"));
> > > >
> > > >        System.out.println("Submitting job....");
> > > >
> > > >        JobClient.runJob(job);
> > > >
> > > >        System.out.println("job info: " + job.getNumMapTasks());
> > > >
> > > >        return 0;
> > > >    }
> > > >
> > > >    public static void main(String[] args) throws Exception {
> > > >        int exitCode = ToolRunner.run(new SimpleDriver(), args);
> > > >        System.out.println("Completed.");
> > > >        System.exit(exitCode);
> > > >    }
> > > >
> > > >
> > > > I'm using the hadoop-core version 0.20.205.0 maven dependency to
> build
> > > and
> > > > run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar
> > has
> > > my
> > > > classes, and it's dependencies included under the /lib folder.
> > > >
> > > > Any help would be greatly appreciated.
> > > >
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>

Re: Externally submitted MapReduce Job Fails at Startup Help Please...

Posted by Steve Lewis <lo...@gmail.com>.
Did you build a jar file for your job and did you put mysql-cinnector,jar in
its lib directory???
I have had this work for me

On Fri, Oct 28, 2011 at 12:56 PM, Jamal x <jm...@gmail.com> wrote:

> Thanks for the response.
>
> I need to submit this job programatically, instead of using the command
> line.  Shouldn't the distributedCache class method handle the classpath
> setup for the job?  If not, is there some other setup missing from my
> driver
> class?
>
> I also, looked into sqoop, but wanted to get this working for a particular
> case which I think isn't a good fit fot it,but I may be wrong.  Plus,
> wanted
> to use this usecase for getting more experience with creating and running
> jobs remotely.
>
> Thanks
> On Oct 28, 2011 1:38 PM, "Brock Noland" <br...@cloudera.com> wrote:
>
> > Hi,
> >
> > I always find that using the -libjars command line option is the
> > easiest way to push jars to the cluster.
> >
> > Also, you may want to checkout Apache Sqoop:
> > http://incubator.apache.org/projects/sqoop.html
> >
> > Brock
> >
> > On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <jm...@gmail.com> wrote:
> > > Hi,
> > >
> > > I wrote a small test program to perform a simple database extraction of
> > > information from a simple table on a remote cluster.  However, it fails
> > to
> > > execute successfully when I run from eclipse it with the following
> > > exception:
> > >
> > > 12:36:08,993  WARN main mapred.JobClient:659 - Use GenericOptionsParser
> > for
> > > parsing the arguments. Applications should implement Tool for the same.
> > > 12:36:09,567  WARN main mapred.JobClient:776 - No job jar file set.
>  User
> > > classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> > > java.lang.RuntimeException: Error in configuring object
> > >    at
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> > >    at
> > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> > >    at
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> > >
> > >    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
> > >    at
> > >
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
> > >
> > >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
> > >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> > >    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > >    at java.security.AccessController.doPrivileged(Native Method)
> > >    at javax.security.auth.Subject.doAs(Subject.java:396)
> > >    at
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> > >
> > >    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > Caused by: java.lang.reflect.InvocationTargetException
> > >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >    at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >
> > >    at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >
> > >    at java.lang.reflect.Method.invoke(Method.java:597)
> > >    at
> > >
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> > >    ... 11 more
> > > Caused by: java.lang.RuntimeException:
> java.lang.ClassNotFoundException:
> > > com.mysql.jdbc.Driver
> > >    at
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)
> > >
> > >    ... 16 more
> > > Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
> > >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > >    at java.security.AccessController.doPrivileged(Native Method)
> > >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> > >    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> > >    at java.lang.Class.forName0(Native Method)
> > >    at java.lang.Class.forName(Class.java:169)
> > >    at
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)
> > >
> > >    at
> > >
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)
> > >
> > >    ... 16 more
> > >
> > >
> > > I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on
> > all
> > > servers in the cluster, and even tried using the
> > > DistributedCache.addArchiveToClassPath method, with no success.  Can
> > someone
> > > please help me figure out what is going on here?
> > >
> > > Here is my simple main which performs the remote submission of the job:
> > > public int run(String[] arg0) throws Exception {
> > >
> > >        System.out.println("Setting up job configuration....");
> > >        Configuration conf = new Configuration();
> > >        conf.set("mapred.job.tracker", "jobtracker.hostname:8021");
> > >        conf.set("fs.default.name", "hdfs://namenode.hostname:9000");
> > >        conf.set("keep.failed.task.files", "true");
> > >        conf.set("mapred.child.java.opts", "-Xmx1024m");
> > >
> > >        FileSystem fs = FileSystem.get(conf);
> > >        fs.delete(new Path("/myfolder/dump_output/"), true);
> > >        fs.mkdirs(new Path("/myfolder/libs/"));
> > >
> > >        fs.copyFromLocalFile(
> > >                new Path(
> > >
> > >
> >
> "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"),
> > >
> > >                new
> > > Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"));
> > >
> > >
> > >          fs.copyFromLocalFile( new Path(
> > >
> > >
> >
> "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar"
> > >
> > >          ), new
> Path("/myfolder/libs/mysql-connector-java-5.1.17.jar"));
> > >
> > >        DistributedCache.addArchiveToClassPath(new Path(
> > >                "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"),
> conf,
> > > fs);
> > >
> > >        DistributedCache.addArchiveToClassPath(new Path(
> > >                "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf,
> > > fs);
> > >
> > >        JobConf job = new JobConf(conf);
> > >
> > >        job.setJobName("Exporting Job");
> > >        job.setJarByClass(MyMapper.class);
> > >        job.setMapperClass(MyMapper.class);
> > >        Class claz = Class.forName("com.mysql.jdbc.Driver");
> > >        if (claz == null) {
> > >            throw new RuntimeException("wow...");
> > >        }
> > >
> > >        Configuration.dumpConfiguration(conf, new
> > PrintWriter(System.out));
> > >
> > >        DBConfiguration
> > >                .configureDB(
> > >                        job,
> > >                        "com.mysql.jdbc.Driver",
> > >
> > > "jdbc:mysql://mydbserver:3306/test?autoReconnect=true",
> > >                        "user", "password");
> > >
> > >        String[] fields = { "employee_id", "name" };
> > >        DBInputFormat.setInput(job, MyRecord.class, "employees", null,
> > >                "employee_id", fields);
> > >
> > >        FileOutputFormat.setOutputPath(job, new Path(
> > >                "/myfolder/dump_output/"));
> > >
> > >        System.out.println("Submitting job....");
> > >
> > >        JobClient.runJob(job);
> > >
> > >        System.out.println("job info: " + job.getNumMapTasks());
> > >
> > >        return 0;
> > >    }
> > >
> > >    public static void main(String[] args) throws Exception {
> > >        int exitCode = ToolRunner.run(new SimpleDriver(), args);
> > >        System.out.println("Completed.");
> > >        System.exit(exitCode);
> > >    }
> > >
> > >
> > > I'm using the hadoop-core version 0.20.205.0 maven dependency to build
> > and
> > > run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar
> has
> > my
> > > classes, and it's dependencies included under the /lib folder.
> > >
> > > Any help would be greatly appreciated.
> > >
> > > Thanks
> > >
> >
>



-- 
Steven M. Lewis PhD
4221 105th Ave NE
Kirkland, WA 98033
206-384-1340 (cell)
Skype lordjoe_com

Re: Externally submitted MapReduce Job Fails at Startup Help Please...

Posted by Jamal x <jm...@gmail.com>.
Thanks for the response.

I need to submit this job programatically, instead of using the command
line.  Shouldn't the distributedCache class method handle the classpath
setup for the job?  If not, is there some other setup missing from my driver
class?

I also, looked into sqoop, but wanted to get this working for a particular
case which I think isn't a good fit fot it,but I may be wrong.  Plus, wanted
to use this usecase for getting more experience with creating and running
jobs remotely.

Thanks
On Oct 28, 2011 1:38 PM, "Brock Noland" <br...@cloudera.com> wrote:

> Hi,
>
> I always find that using the -libjars command line option is the
> easiest way to push jars to the cluster.
>
> Also, you may want to checkout Apache Sqoop:
> http://incubator.apache.org/projects/sqoop.html
>
> Brock
>
> On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <jm...@gmail.com> wrote:
> > Hi,
> >
> > I wrote a small test program to perform a simple database extraction of
> > information from a simple table on a remote cluster.  However, it fails
> to
> > execute successfully when I run from eclipse it with the following
> > exception:
> >
> > 12:36:08,993  WARN main mapred.JobClient:659 - Use GenericOptionsParser
> for
> > parsing the arguments. Applications should implement Tool for the same.
> > 12:36:09,567  WARN main mapred.JobClient:776 - No job jar file set.  User
> > classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> > java.lang.RuntimeException: Error in configuring object
> >    at
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
> >    at
> > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
> >    at
> >
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
> >
> >    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
> >
> >    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:396)
> >    at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> >
> >    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.lang.reflect.InvocationTargetException
> >    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >    at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >
> >    at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >
> >    at java.lang.reflect.Method.invoke(Method.java:597)
> >    at
> >
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
> >    ... 11 more
> > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> > com.mysql.jdbc.Driver
> >    at
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)
> >
> >    ... 16 more
> > Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
> >    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
> >    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> >    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
> >    at java.lang.Class.forName0(Native Method)
> >    at java.lang.Class.forName(Class.java:169)
> >    at
> >
> org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)
> >
> >    at
> >
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)
> >
> >    ... 16 more
> >
> >
> > I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on
> all
> > servers in the cluster, and even tried using the
> > DistributedCache.addArchiveToClassPath method, with no success.  Can
> someone
> > please help me figure out what is going on here?
> >
> > Here is my simple main which performs the remote submission of the job:
> > public int run(String[] arg0) throws Exception {
> >
> >        System.out.println("Setting up job configuration....");
> >        Configuration conf = new Configuration();
> >        conf.set("mapred.job.tracker", "jobtracker.hostname:8021");
> >        conf.set("fs.default.name", "hdfs://namenode.hostname:9000");
> >        conf.set("keep.failed.task.files", "true");
> >        conf.set("mapred.child.java.opts", "-Xmx1024m");
> >
> >        FileSystem fs = FileSystem.get(conf);
> >        fs.delete(new Path("/myfolder/dump_output/"), true);
> >        fs.mkdirs(new Path("/myfolder/libs/"));
> >
> >        fs.copyFromLocalFile(
> >                new Path(
> >
> >
> "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"),
> >
> >                new
> > Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"));
> >
> >
> >          fs.copyFromLocalFile( new Path(
> >
> >
> "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar"
> >
> >          ), new Path("/myfolder/libs/mysql-connector-java-5.1.17.jar"));
> >
> >        DistributedCache.addArchiveToClassPath(new Path(
> >                "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"), conf,
> > fs);
> >
> >        DistributedCache.addArchiveToClassPath(new Path(
> >                "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf,
> > fs);
> >
> >        JobConf job = new JobConf(conf);
> >
> >        job.setJobName("Exporting Job");
> >        job.setJarByClass(MyMapper.class);
> >        job.setMapperClass(MyMapper.class);
> >        Class claz = Class.forName("com.mysql.jdbc.Driver");
> >        if (claz == null) {
> >            throw new RuntimeException("wow...");
> >        }
> >
> >        Configuration.dumpConfiguration(conf, new
> PrintWriter(System.out));
> >
> >        DBConfiguration
> >                .configureDB(
> >                        job,
> >                        "com.mysql.jdbc.Driver",
> >
> > "jdbc:mysql://mydbserver:3306/test?autoReconnect=true",
> >                        "user", "password");
> >
> >        String[] fields = { "employee_id", "name" };
> >        DBInputFormat.setInput(job, MyRecord.class, "employees", null,
> >                "employee_id", fields);
> >
> >        FileOutputFormat.setOutputPath(job, new Path(
> >                "/myfolder/dump_output/"));
> >
> >        System.out.println("Submitting job....");
> >
> >        JobClient.runJob(job);
> >
> >        System.out.println("job info: " + job.getNumMapTasks());
> >
> >        return 0;
> >    }
> >
> >    public static void main(String[] args) throws Exception {
> >        int exitCode = ToolRunner.run(new SimpleDriver(), args);
> >        System.out.println("Completed.");
> >        System.exit(exitCode);
> >    }
> >
> >
> > I'm using the hadoop-core version 0.20.205.0 maven dependency to build
> and
> > run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar has
> my
> > classes, and it's dependencies included under the /lib folder.
> >
> > Any help would be greatly appreciated.
> >
> > Thanks
> >
>

Re: Externally submitted MapReduce Job Fails at Startup Help Please...

Posted by Brock Noland <br...@cloudera.com>.
Hi,

I always find that using the -libjars command line option is the
easiest way to push jars to the cluster.

Also, you may want to checkout Apache Sqoop:
http://incubator.apache.org/projects/sqoop.html

Brock

On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <jm...@gmail.com> wrote:
> Hi,
>
> I wrote a small test program to perform a simple database extraction of
> information from a simple table on a remote cluster.  However, it fails to
> execute successfully when I run from eclipse it with the following
> exception:
>
> 12:36:08,993  WARN main mapred.JobClient:659 - Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 12:36:09,567  WARN main mapred.JobClient:776 - No job jar file set.  User
> classes may not be found. See JobConf(Class) or JobConf#setJar(String).
> java.lang.RuntimeException: Error in configuring object
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
>    at
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
>    at
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
>
>    at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575)
>    at
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197)
>
>    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:396)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.lang.reflect.InvocationTargetException
>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>    at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
>    at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
>    at java.lang.reflect.Method.invoke(Method.java:597)
>    at
> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
>    ... 11 more
> Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException:
> com.mysql.jdbc.Driver
>    at
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271)
>
>    ... 16 more
> Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:169)
>    at
> org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123)
>
>    at
> org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266)
>
>    ... 16 more
>
>
> I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on all
> servers in the cluster, and even tried using the
> DistributedCache.addArchiveToClassPath method, with no success.  Can someone
> please help me figure out what is going on here?
>
> Here is my simple main which performs the remote submission of the job:
> public int run(String[] arg0) throws Exception {
>
>        System.out.println("Setting up job configuration....");
>        Configuration conf = new Configuration();
>        conf.set("mapred.job.tracker", "jobtracker.hostname:8021");
>        conf.set("fs.default.name", "hdfs://namenode.hostname:9000");
>        conf.set("keep.failed.task.files", "true");
>        conf.set("mapred.child.java.opts", "-Xmx1024m");
>
>        FileSystem fs = FileSystem.get(conf);
>        fs.delete(new Path("/myfolder/dump_output/"), true);
>        fs.mkdirs(new Path("/myfolder/libs/"));
>
>        fs.copyFromLocalFile(
>                new Path(
>
> "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"),
>
>                new
> Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"));
>
>
>          fs.copyFromLocalFile( new Path(
>
> "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar"
>
>          ), new Path("/myfolder/libs/mysql-connector-java-5.1.17.jar"));
>
>        DistributedCache.addArchiveToClassPath(new Path(
>                "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"), conf,
> fs);
>
>        DistributedCache.addArchiveToClassPath(new Path(
>                "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf,
> fs);
>
>        JobConf job = new JobConf(conf);
>
>        job.setJobName("Exporting Job");
>        job.setJarByClass(MyMapper.class);
>        job.setMapperClass(MyMapper.class);
>        Class claz = Class.forName("com.mysql.jdbc.Driver");
>        if (claz == null) {
>            throw new RuntimeException("wow...");
>        }
>
>        Configuration.dumpConfiguration(conf, new PrintWriter(System.out));
>
>        DBConfiguration
>                .configureDB(
>                        job,
>                        "com.mysql.jdbc.Driver",
>
> "jdbc:mysql://mydbserver:3306/test?autoReconnect=true",
>                        "user", "password");
>
>        String[] fields = { "employee_id", "name" };
>        DBInputFormat.setInput(job, MyRecord.class, "employees", null,
>                "employee_id", fields);
>
>        FileOutputFormat.setOutputPath(job, new Path(
>                "/myfolder/dump_output/"));
>
>        System.out.println("Submitting job....");
>
>        JobClient.runJob(job);
>
>        System.out.println("job info: " + job.getNumMapTasks());
>
>        return 0;
>    }
>
>    public static void main(String[] args) throws Exception {
>        int exitCode = ToolRunner.run(new SimpleDriver(), args);
>        System.out.println("Completed.");
>        System.exit(exitCode);
>    }
>
>
> I'm using the hadoop-core version 0.20.205.0 maven dependency to build and
> run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar has my
> classes, and it's dependencies included under the /lib folder.
>
> Any help would be greatly appreciated.
>
> Thanks
>