You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sameer Tilak <ss...@live.com> on 2013/12/20 20:44:27 UTC

libjar and Mahout



Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We are using Apache Pig to build our data pipeline and are planning to use Apache Mahout for data analysis. 

javac -d /apps/analytics/ -classpath .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar SimpleKMeansClustering.java

jar -cf myanalytics.jar myanalytics/


hadoop jar /apps/analytics/myanalytics.jar myanalytics.SimpleKMeansClustering -libjars /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

I have call the following method in my SimpleKMeansClustering class:

           
 KMeansDriver.run(conf, new Path("/scratch/dummyvector.seq"), new 
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
                             new Path("/scratch/dummyvectoroutput"), new EuclideanDistanceMeasure(), 0.001, 10,
                             true, 1.0, false);


I
 unfortunately get the following error, In think somehow the jars are 
not made available in the distributed cached. I use Vectors to repreent 
my data and I write it to a sequence file. I then use that Driver to 
analyze that in the mapreduce mode. I think locally all the required jar
 files are available, however somehow in the mapreduce mode they are not
 available. Any help with this would be great!

13/12/19 16:59:02 
INFO kmeans.KMeansDriver: Input: /scratch/dummyvector.seq Clusters In: 
/scratch/dummyvector-initclusters/part-randomSeed Out: 
/scratch/dummyvectoroutput Distance: 
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
13/12/19
 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing 
the arguments. Applications should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job: job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id : attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:264)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
    at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
    at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
    at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
    at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

The information says that "Include the JAR in the “-libjars” command line option of the `hadoop jar …` command. The jar will be placed in distributed cache and will be made available to all of the job’s task attempts."

For the hadoop command line options and the method 1 to work the main class should implement Tool and call  ToolRunner.run(). Therefore I changed the class as follows:


I was getting an error that 

public class SimpleKMeansClustering extends Configured implements Tool {
Code....

 public int run(String[] args) throws Exception
    {
        //      Configuration conf = new Configuration();
        Configuration conf = getConf();
        FileSystem fs = FileSystem.get(conf);
        Job job = new Job(conf, "SimpleKMeansClustering");

        //to accept the hdfs input and outpur dir at run time

    FileInputFormat.addInputPath(job, new Path("/scratch/dummyvector.seq"));
        FileOutputFormat.setOutputPath(job, new Path("/scratch/dummyvectoroutput"));

    SimpleKMeansClustering smkc = new SimpleKMeansClustering();
        System.out.println ("SimpleKMeansClustering::main -- Wiil call SequenceFile.Writer \n");

    populateData();
     writePointsToFile("/scratch/dummyvector.seq",fs,conf);
    readPointsFromFile(fs, conf);
    runKmeansDriver(conf);

    return job.waitForCompletion(true) ? 0 : 1;

    }
    public static void main(String args[]) throws Exception {

        int res = ToolRunner.run(new SimpleKMeansClustering(), args);
        System.exit(res);
    }
}

I am having some issues with the new and old API. Can someone please point me in the correct direction?

SimpleKMeansClustering.java:148: error: method addInputPath in class FileInputFormat<K,V> cannot be applied to given types;
    FileInputFormat.addInputPath(job, new Path("/scratch/dummyvector.seq"));
                   ^
  required: JobConf,Path
  found: Job,Path
  reason: actual argument Job cannot be converted to JobConf by method invocation conversion
  where K,V are type-variables:
    K extends Object declared in class FileInputFormat
    V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath in class FileOutputFormat<K,V> cannot be applied to given types;
    FileOutputFormat.setOutputPath(job, new Path("/scratch/dummyvectoroutput"));
                    ^
  required: JobConf,Path
  found: Job,Path
  reason: actual argument Job cannot be converted to JobConf by method invocation conversion
  where K,V are type-variables:
    K extends Object declared in class FileOutputFormat
    V extends Object declared in class FileOutputFormat
2 errors

 		 	   		  

RE: libjar and Mahout

Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.

Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout


  
    
  
  
    In your hadoop command I see a space in
      the part

      ...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

      

      just after .jar

      Should it not be

      ...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk

      Chris

      

      On 12/20/2013 2:44 PM, Sameer Tilak wrote:

    
    
      
        
        Hi All,

          I am running Hadoop 1.0.3 -- probably will upgrade mid-next
          year. We are using Apache Pig to build our data pipeline and
          are planning to use Apache Mahout for data analysis. 

          

          javac -d /apps/analytics/ -classpath
          .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar

          SimpleKMeansClustering.java

          

          jar -cf myanalytics.jar myanalytics/

          

          

          hadoop jar /apps/analytics/myanalytics.jar
          myanalytics.SimpleKMeansClustering -libjars
          /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

          

          I have call the following method in my SimpleKMeansClustering
          class:

          

                      KMeansDriver.run(conf, new
          Path("/scratch/dummyvector.seq"), new
          Path("/scratch/dummyvector-initclusters/part-randomSeed/"),

                                       new
          Path("/scratch/dummyvectoroutput"), new
          EuclideanDistanceMeasure(), 0.001, 10,

                                       true, 1.0, false);

          

          

          I unfortunately get the following error, In think somehow the
          jars are not made available in the distributed cached. I use
          Vectors to repreent my data and I write it to a sequence file.
          I then use that Driver to analyze that in the mapreduce mode.
          I think locally all the required jar files are available,
          however somehow in the mapreduce mode they are not available.
          Any help with this would be great!

          

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
          /scratch/dummyvector.seq Clusters In:
          /scratch/dummyvector-initclusters/part-randomSeed Out:
          /scratch/dummyvectoroutput Distance:
          org.apache.mahout.common.distance.EuclideanDistanceMeasure

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
          max Iterations: 10

          13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
          native-hadoop library

          13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
          & initialized native-zlib library

          13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
          decompressor

          13/12/19 16:59:02 WARN mapred.JobClient: Use
          GenericOptionsParser for parsing the arguments. Applications
          should implement Tool for the same.

          13/12/19 16:59:02 INFO input.FileInputFormat: Total input
          paths to process : 1

          13/12/19 16:59:03 INFO mapred.JobClient: Running job:
          job_201311111627_0310

          13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%

          13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
          attempt_201311111627_0310_m_000000_0, Status : FAILED

          Error: java.lang.ClassNotFoundException:
          org.apache.mahout.math.Vector

              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

              at java.security.AccessController.doPrivileged(Native
          Method)

              at
          java.net.URLClassLoader.findClass(URLClassLoader.java:354)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

              at
          sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

              at java.lang.Class.forName0(Native Method)

              at java.lang.Class.forName(Class.java:264)

              at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)

              at
          org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)

              at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)

              at
          org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)

              at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

          

          To resolve this, I came across this article:

          http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

          

          The information says that "Include the JAR in the “-libjars”
          command line option of the `hadoop jar …` command. The jar
          will be placed in distributed

            cache and will be made available to all of the job’s
          task attempts."

          

          For the hadoop command line options and the method 1 to work
          the main class should implement Tool and call
          ToolRunner.run(). Therefore I changed the class as follows:

          

          

          I was getting an error that 

          

          public class SimpleKMeansClustering extends Configured
          implements Tool {

          Code....

          

           public int run(String[] args) throws Exception

              {

                  //      Configuration conf = new Configuration();

                  Configuration conf = getConf();

                  FileSystem fs = FileSystem.get(conf);

                  Job job = new Job(conf, "SimpleKMeansClustering");

          

                  //to accept the hdfs input and outpur dir at run time

          

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                  FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

          

              SimpleKMeansClustering smkc = new
          SimpleKMeansClustering();

                  System.out.println ("SimpleKMeansClustering::main --
          Wiil call SequenceFile.Writer \n");

          

              populateData();

               writePointsToFile("/scratch/dummyvector.seq",fs,conf);

              readPointsFromFile(fs, conf);

              runKmeansDriver(conf);

          

              return job.waitForCompletion(true) ? 0 : 1;

          

              }

              public static void main(String args[]) throws Exception {

          

                  int res = ToolRunner.run(new SimpleKMeansClustering(),
          args);

                  System.exit(res);

              }

          }

          

          I am having some issues with the new and old API. Can someone
          please point me in the correct direction?

          

          SimpleKMeansClustering.java:148: error: method addInputPath in
          class FileInputFormat<K,V> cannot be applied to given
          types;

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                             ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileInputFormat

              V extends Object declared in class FileInputFormat

          SimpleKMeansClustering.java:149: error: method setOutputPath
          in class FileOutputFormat<K,V> cannot be applied to
          given types;

              FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

                              ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileOutputFormat

              V extends Object declared in class FileOutputFormat

          2 errors

        
      
    
    
 		 	   		  

RE: libjar and Mahout

Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.

Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout


  
    
  
  
    In your hadoop command I see a space in
      the part

      ...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

      

      just after .jar

      Should it not be

      ...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk

      Chris

      

      On 12/20/2013 2:44 PM, Sameer Tilak wrote:

    
    
      
        
        Hi All,

          I am running Hadoop 1.0.3 -- probably will upgrade mid-next
          year. We are using Apache Pig to build our data pipeline and
          are planning to use Apache Mahout for data analysis. 

          

          javac -d /apps/analytics/ -classpath
          .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar

          SimpleKMeansClustering.java

          

          jar -cf myanalytics.jar myanalytics/

          

          

          hadoop jar /apps/analytics/myanalytics.jar
          myanalytics.SimpleKMeansClustering -libjars
          /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

          

          I have call the following method in my SimpleKMeansClustering
          class:

          

                      KMeansDriver.run(conf, new
          Path("/scratch/dummyvector.seq"), new
          Path("/scratch/dummyvector-initclusters/part-randomSeed/"),

                                       new
          Path("/scratch/dummyvectoroutput"), new
          EuclideanDistanceMeasure(), 0.001, 10,

                                       true, 1.0, false);

          

          

          I unfortunately get the following error, In think somehow the
          jars are not made available in the distributed cached. I use
          Vectors to repreent my data and I write it to a sequence file.
          I then use that Driver to analyze that in the mapreduce mode.
          I think locally all the required jar files are available,
          however somehow in the mapreduce mode they are not available.
          Any help with this would be great!

          

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
          /scratch/dummyvector.seq Clusters In:
          /scratch/dummyvector-initclusters/part-randomSeed Out:
          /scratch/dummyvectoroutput Distance:
          org.apache.mahout.common.distance.EuclideanDistanceMeasure

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
          max Iterations: 10

          13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
          native-hadoop library

          13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
          & initialized native-zlib library

          13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
          decompressor

          13/12/19 16:59:02 WARN mapred.JobClient: Use
          GenericOptionsParser for parsing the arguments. Applications
          should implement Tool for the same.

          13/12/19 16:59:02 INFO input.FileInputFormat: Total input
          paths to process : 1

          13/12/19 16:59:03 INFO mapred.JobClient: Running job:
          job_201311111627_0310

          13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%

          13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
          attempt_201311111627_0310_m_000000_0, Status : FAILED

          Error: java.lang.ClassNotFoundException:
          org.apache.mahout.math.Vector

              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

              at java.security.AccessController.doPrivileged(Native
          Method)

              at
          java.net.URLClassLoader.findClass(URLClassLoader.java:354)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

              at
          sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

              at java.lang.Class.forName0(Native Method)

              at java.lang.Class.forName(Class.java:264)

              at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)

              at
          org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)

              at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)

              at
          org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)

              at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

          

          To resolve this, I came across this article:

          http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

          

          The information says that "Include the JAR in the “-libjars”
          command line option of the `hadoop jar …` command. The jar
          will be placed in distributed

            cache and will be made available to all of the job’s
          task attempts."

          

          For the hadoop command line options and the method 1 to work
          the main class should implement Tool and call
          ToolRunner.run(). Therefore I changed the class as follows:

          

          

          I was getting an error that 

          

          public class SimpleKMeansClustering extends Configured
          implements Tool {

          Code....

          

           public int run(String[] args) throws Exception

              {

                  //      Configuration conf = new Configuration();

                  Configuration conf = getConf();

                  FileSystem fs = FileSystem.get(conf);

                  Job job = new Job(conf, "SimpleKMeansClustering");

          

                  //to accept the hdfs input and outpur dir at run time

          

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                  FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

          

              SimpleKMeansClustering smkc = new
          SimpleKMeansClustering();

                  System.out.println ("SimpleKMeansClustering::main --
          Wiil call SequenceFile.Writer \n");

          

              populateData();

               writePointsToFile("/scratch/dummyvector.seq",fs,conf);

              readPointsFromFile(fs, conf);

              runKmeansDriver(conf);

          

              return job.waitForCompletion(true) ? 0 : 1;

          

              }

              public static void main(String args[]) throws Exception {

          

                  int res = ToolRunner.run(new SimpleKMeansClustering(),
          args);

                  System.exit(res);

              }

          }

          

          I am having some issues with the new and old API. Can someone
          please point me in the correct direction?

          

          SimpleKMeansClustering.java:148: error: method addInputPath in
          class FileInputFormat<K,V> cannot be applied to given
          types;

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                             ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileInputFormat

              V extends Object declared in class FileInputFormat

          SimpleKMeansClustering.java:149: error: method setOutputPath
          in class FileOutputFormat<K,V> cannot be applied to
          given types;

              FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

                              ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileOutputFormat

              V extends Object declared in class FileOutputFormat

          2 errors

        
      
    
    
 		 	   		  

RE: libjar and Mahout

Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.

Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout


  
    
  
  
    In your hadoop command I see a space in
      the part

      ...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

      

      just after .jar

      Should it not be

      ...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk

      Chris

      

      On 12/20/2013 2:44 PM, Sameer Tilak wrote:

    
    
      
        
        Hi All,

          I am running Hadoop 1.0.3 -- probably will upgrade mid-next
          year. We are using Apache Pig to build our data pipeline and
          are planning to use Apache Mahout for data analysis. 

          

          javac -d /apps/analytics/ -classpath
          .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar

          SimpleKMeansClustering.java

          

          jar -cf myanalytics.jar myanalytics/

          

          

          hadoop jar /apps/analytics/myanalytics.jar
          myanalytics.SimpleKMeansClustering -libjars
          /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

          

          I have call the following method in my SimpleKMeansClustering
          class:

          

                      KMeansDriver.run(conf, new
          Path("/scratch/dummyvector.seq"), new
          Path("/scratch/dummyvector-initclusters/part-randomSeed/"),

                                       new
          Path("/scratch/dummyvectoroutput"), new
          EuclideanDistanceMeasure(), 0.001, 10,

                                       true, 1.0, false);

          

          

          I unfortunately get the following error, In think somehow the
          jars are not made available in the distributed cached. I use
          Vectors to repreent my data and I write it to a sequence file.
          I then use that Driver to analyze that in the mapreduce mode.
          I think locally all the required jar files are available,
          however somehow in the mapreduce mode they are not available.
          Any help with this would be great!

          

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
          /scratch/dummyvector.seq Clusters In:
          /scratch/dummyvector-initclusters/part-randomSeed Out:
          /scratch/dummyvectoroutput Distance:
          org.apache.mahout.common.distance.EuclideanDistanceMeasure

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
          max Iterations: 10

          13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
          native-hadoop library

          13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
          & initialized native-zlib library

          13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
          decompressor

          13/12/19 16:59:02 WARN mapred.JobClient: Use
          GenericOptionsParser for parsing the arguments. Applications
          should implement Tool for the same.

          13/12/19 16:59:02 INFO input.FileInputFormat: Total input
          paths to process : 1

          13/12/19 16:59:03 INFO mapred.JobClient: Running job:
          job_201311111627_0310

          13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%

          13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
          attempt_201311111627_0310_m_000000_0, Status : FAILED

          Error: java.lang.ClassNotFoundException:
          org.apache.mahout.math.Vector

              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

              at java.security.AccessController.doPrivileged(Native
          Method)

              at
          java.net.URLClassLoader.findClass(URLClassLoader.java:354)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

              at
          sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

              at java.lang.Class.forName0(Native Method)

              at java.lang.Class.forName(Class.java:264)

              at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)

              at
          org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)

              at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)

              at
          org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)

              at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

          

          To resolve this, I came across this article:

          http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

          

          The information says that "Include the JAR in the “-libjars”
          command line option of the `hadoop jar …` command. The jar
          will be placed in distributed

            cache and will be made available to all of the job’s
          task attempts."

          

          For the hadoop command line options and the method 1 to work
          the main class should implement Tool and call
          ToolRunner.run(). Therefore I changed the class as follows:

          

          

          I was getting an error that 

          

          public class SimpleKMeansClustering extends Configured
          implements Tool {

          Code....

          

           public int run(String[] args) throws Exception

              {

                  //      Configuration conf = new Configuration();

                  Configuration conf = getConf();

                  FileSystem fs = FileSystem.get(conf);

                  Job job = new Job(conf, "SimpleKMeansClustering");

          

                  //to accept the hdfs input and outpur dir at run time

          

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                  FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

          

              SimpleKMeansClustering smkc = new
          SimpleKMeansClustering();

                  System.out.println ("SimpleKMeansClustering::main --
          Wiil call SequenceFile.Writer \n");

          

              populateData();

               writePointsToFile("/scratch/dummyvector.seq",fs,conf);

              readPointsFromFile(fs, conf);

              runKmeansDriver(conf);

          

              return job.waitForCompletion(true) ? 0 : 1;

          

              }

              public static void main(String args[]) throws Exception {

          

                  int res = ToolRunner.run(new SimpleKMeansClustering(),
          args);

                  System.exit(res);

              }

          }

          

          I am having some issues with the new and old API. Can someone
          please point me in the correct direction?

          

          SimpleKMeansClustering.java:148: error: method addInputPath in
          class FileInputFormat<K,V> cannot be applied to given
          types;

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                             ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileInputFormat

              V extends Object declared in class FileInputFormat

          SimpleKMeansClustering.java:149: error: method setOutputPath
          in class FileOutputFormat<K,V> cannot be applied to
          given types;

              FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

                              ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileOutputFormat

              V extends Object declared in class FileOutputFormat

          2 errors

        
      
    
    
 		 	   		  

RE: libjar and Mahout

Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.

Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout


  
    
  
  
    In your hadoop command I see a space in
      the part

      ...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

      

      just after .jar

      Should it not be

      ...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk

      Chris

      

      On 12/20/2013 2:44 PM, Sameer Tilak wrote:

    
    
      
        
        Hi All,

          I am running Hadoop 1.0.3 -- probably will upgrade mid-next
          year. We are using Apache Pig to build our data pipeline and
          are planning to use Apache Mahout for data analysis. 

          

          javac -d /apps/analytics/ -classpath
          .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar

          SimpleKMeansClustering.java

          

          jar -cf myanalytics.jar myanalytics/

          

          

          hadoop jar /apps/analytics/myanalytics.jar
          myanalytics.SimpleKMeansClustering -libjars
          /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar

          

          I have call the following method in my SimpleKMeansClustering
          class:

          

                      KMeansDriver.run(conf, new
          Path("/scratch/dummyvector.seq"), new
          Path("/scratch/dummyvector-initclusters/part-randomSeed/"),

                                       new
          Path("/scratch/dummyvectoroutput"), new
          EuclideanDistanceMeasure(), 0.001, 10,

                                       true, 1.0, false);

          

          

          I unfortunately get the following error, In think somehow the
          jars are not made available in the distributed cached. I use
          Vectors to repreent my data and I write it to a sequence file.
          I then use that Driver to analyze that in the mapreduce mode.
          I think locally all the required jar files are available,
          however somehow in the mapreduce mode they are not available.
          Any help with this would be great!

          

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
          /scratch/dummyvector.seq Clusters In:
          /scratch/dummyvector-initclusters/part-randomSeed Out:
          /scratch/dummyvectoroutput Distance:
          org.apache.mahout.common.distance.EuclideanDistanceMeasure

          13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
          max Iterations: 10

          13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
          native-hadoop library

          13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
          & initialized native-zlib library

          13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
          decompressor

          13/12/19 16:59:02 WARN mapred.JobClient: Use
          GenericOptionsParser for parsing the arguments. Applications
          should implement Tool for the same.

          13/12/19 16:59:02 INFO input.FileInputFormat: Total input
          paths to process : 1

          13/12/19 16:59:03 INFO mapred.JobClient: Running job:
          job_201311111627_0310

          13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%

          13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
          attempt_201311111627_0310_m_000000_0, Status : FAILED

          Error: java.lang.ClassNotFoundException:
          org.apache.mahout.math.Vector

              at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

              at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

              at java.security.AccessController.doPrivileged(Native
          Method)

              at
          java.net.URLClassLoader.findClass(URLClassLoader.java:354)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:423)

              at
          sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)

              at java.lang.ClassLoader.loadClass(ClassLoader.java:356)

              at java.lang.Class.forName0(Native Method)

              at java.lang.Class.forName(Class.java:264)

              at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)

              at
          org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)

              at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)

              at
          org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)

              at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)

              at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)

          

          To resolve this, I came across this article:

          http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

          

          The information says that "Include the JAR in the “-libjars”
          command line option of the `hadoop jar …` command. The jar
          will be placed in distributed

            cache and will be made available to all of the job’s
          task attempts."

          

          For the hadoop command line options and the method 1 to work
          the main class should implement Tool and call
          ToolRunner.run(). Therefore I changed the class as follows:

          

          

          I was getting an error that 

          

          public class SimpleKMeansClustering extends Configured
          implements Tool {

          Code....

          

           public int run(String[] args) throws Exception

              {

                  //      Configuration conf = new Configuration();

                  Configuration conf = getConf();

                  FileSystem fs = FileSystem.get(conf);

                  Job job = new Job(conf, "SimpleKMeansClustering");

          

                  //to accept the hdfs input and outpur dir at run time

          

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                  FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

          

              SimpleKMeansClustering smkc = new
          SimpleKMeansClustering();

                  System.out.println ("SimpleKMeansClustering::main --
          Wiil call SequenceFile.Writer \n");

          

              populateData();

               writePointsToFile("/scratch/dummyvector.seq",fs,conf);

              readPointsFromFile(fs, conf);

              runKmeansDriver(conf);

          

              return job.waitForCompletion(true) ? 0 : 1;

          

              }

              public static void main(String args[]) throws Exception {

          

                  int res = ToolRunner.run(new SimpleKMeansClustering(),
          args);

                  System.exit(res);

              }

          }

          

          I am having some issues with the new and old API. Can someone
          please point me in the correct direction?

          

          SimpleKMeansClustering.java:148: error: method addInputPath in
          class FileInputFormat<K,V> cannot be applied to given
          types;

              FileInputFormat.addInputPath(job, new
          Path("/scratch/dummyvector.seq"));

                             ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileInputFormat

              V extends Object declared in class FileInputFormat

          SimpleKMeansClustering.java:149: error: method setOutputPath
          in class FileOutputFormat<K,V> cannot be applied to
          given types;

              FileOutputFormat.setOutputPath(job, new
          Path("/scratch/dummyvectoroutput"));

                              ^

            required: JobConf,Path

            found: Job,Path

            reason: actual argument Job cannot be converted to JobConf
          by method invocation conversion

            where K,V are type-variables:

              K extends Object declared in class FileOutputFormat

              V extends Object declared in class FileOutputFormat

          2 errors

        
      
    
    
 		 	   		  

Re: libjar and Mahout

Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris

On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We 
> are using Apache Pig to build our data pipeline and are planning to 
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath 
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar 
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar 
> myanalytics.SimpleKMeansClustering -libjars 
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
>             KMeansDriver.run(conf, new 
> Path("/scratch/dummyvector.seq"), new 
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
>                              new Path("/scratch/dummyvectoroutput"), 
> new EuclideanDistanceMeasure(), 0.001, 10,
>                              true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are 
> not made available in the distributed cached. I use Vectors to 
> repreent my data and I write it to a sequence file. I then use that 
> Driver to analyze that in the mapreduce mode. I think locally all the 
> required jar files are available, however somehow in the mapreduce 
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: 
> /scratch/dummyvector.seq Clusters In: 
> /scratch/dummyvector-initclusters/part-randomSeed Out: 
> /scratch/dummyvectoroutput Distance: 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & 
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: 
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:264)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>     at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>     at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command 
> line option of the `hadoop jar …` command. The jar will be placed in 
> distributed cache 
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache> 
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main 
> class should implement Tool and call ToolRunner.run(). Therefore I 
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
>  public int run(String[] args) throws Exception
>     {
>         //      Configuration conf = new Configuration();
>         Configuration conf = getConf();
>         FileSystem fs = FileSystem.get(conf);
>         Job job = new Job(conf, "SimpleKMeansClustering");
>
>         //to accept the hdfs input and outpur dir at run time
>
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>         FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>
>     SimpleKMeansClustering smkc = new SimpleKMeansClustering();
>         System.out.println ("SimpleKMeansClustering::main -- Wiil call 
> SequenceFile.Writer \n");
>
>     populateData();
>      writePointsToFile("/scratch/dummyvector.seq",fs,conf);
>     readPointsFromFile(fs, conf);
>     runKmeansDriver(conf);
>
>     return job.waitForCompletion(true) ? 0 : 1;
>
>     }
>     public static void main(String args[]) throws Exception {
>
>         int res = ToolRunner.run(new SimpleKMeansClustering(), args);
>         System.exit(res);
>     }
> }
>
> I am having some issues with the new and old API. Can someone please 
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class 
> FileInputFormat<K,V> cannot be applied to given types;
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>                    ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileInputFormat
>     V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class 
> FileOutputFormat<K,V> cannot be applied to given types;
>     FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>                     ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileOutputFormat
>     V extends Object declared in class FileOutputFormat
> 2 errors


Re: libjar and Mahout

Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris

On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We 
> are using Apache Pig to build our data pipeline and are planning to 
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath 
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar 
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar 
> myanalytics.SimpleKMeansClustering -libjars 
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
>             KMeansDriver.run(conf, new 
> Path("/scratch/dummyvector.seq"), new 
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
>                              new Path("/scratch/dummyvectoroutput"), 
> new EuclideanDistanceMeasure(), 0.001, 10,
>                              true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are 
> not made available in the distributed cached. I use Vectors to 
> repreent my data and I write it to a sequence file. I then use that 
> Driver to analyze that in the mapreduce mode. I think locally all the 
> required jar files are available, however somehow in the mapreduce 
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: 
> /scratch/dummyvector.seq Clusters In: 
> /scratch/dummyvector-initclusters/part-randomSeed Out: 
> /scratch/dummyvectoroutput Distance: 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & 
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: 
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:264)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>     at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>     at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command 
> line option of the `hadoop jar …` command. The jar will be placed in 
> distributed cache 
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache> 
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main 
> class should implement Tool and call ToolRunner.run(). Therefore I 
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
>  public int run(String[] args) throws Exception
>     {
>         //      Configuration conf = new Configuration();
>         Configuration conf = getConf();
>         FileSystem fs = FileSystem.get(conf);
>         Job job = new Job(conf, "SimpleKMeansClustering");
>
>         //to accept the hdfs input and outpur dir at run time
>
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>         FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>
>     SimpleKMeansClustering smkc = new SimpleKMeansClustering();
>         System.out.println ("SimpleKMeansClustering::main -- Wiil call 
> SequenceFile.Writer \n");
>
>     populateData();
>      writePointsToFile("/scratch/dummyvector.seq",fs,conf);
>     readPointsFromFile(fs, conf);
>     runKmeansDriver(conf);
>
>     return job.waitForCompletion(true) ? 0 : 1;
>
>     }
>     public static void main(String args[]) throws Exception {
>
>         int res = ToolRunner.run(new SimpleKMeansClustering(), args);
>         System.exit(res);
>     }
> }
>
> I am having some issues with the new and old API. Can someone please 
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class 
> FileInputFormat<K,V> cannot be applied to given types;
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>                    ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileInputFormat
>     V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class 
> FileOutputFormat<K,V> cannot be applied to given types;
>     FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>                     ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileOutputFormat
>     V extends Object declared in class FileOutputFormat
> 2 errors


Re: libjar and Mahout

Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris

On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We 
> are using Apache Pig to build our data pipeline and are planning to 
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath 
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar 
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar 
> myanalytics.SimpleKMeansClustering -libjars 
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
>             KMeansDriver.run(conf, new 
> Path("/scratch/dummyvector.seq"), new 
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
>                              new Path("/scratch/dummyvectoroutput"), 
> new EuclideanDistanceMeasure(), 0.001, 10,
>                              true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are 
> not made available in the distributed cached. I use Vectors to 
> repreent my data and I write it to a sequence file. I then use that 
> Driver to analyze that in the mapreduce mode. I think locally all the 
> required jar files are available, however somehow in the mapreduce 
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: 
> /scratch/dummyvector.seq Clusters In: 
> /scratch/dummyvector-initclusters/part-randomSeed Out: 
> /scratch/dummyvectoroutput Distance: 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & 
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: 
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:264)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>     at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>     at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command 
> line option of the `hadoop jar …` command. The jar will be placed in 
> distributed cache 
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache> 
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main 
> class should implement Tool and call ToolRunner.run(). Therefore I 
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
>  public int run(String[] args) throws Exception
>     {
>         //      Configuration conf = new Configuration();
>         Configuration conf = getConf();
>         FileSystem fs = FileSystem.get(conf);
>         Job job = new Job(conf, "SimpleKMeansClustering");
>
>         //to accept the hdfs input and outpur dir at run time
>
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>         FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>
>     SimpleKMeansClustering smkc = new SimpleKMeansClustering();
>         System.out.println ("SimpleKMeansClustering::main -- Wiil call 
> SequenceFile.Writer \n");
>
>     populateData();
>      writePointsToFile("/scratch/dummyvector.seq",fs,conf);
>     readPointsFromFile(fs, conf);
>     runKmeansDriver(conf);
>
>     return job.waitForCompletion(true) ? 0 : 1;
>
>     }
>     public static void main(String args[]) throws Exception {
>
>         int res = ToolRunner.run(new SimpleKMeansClustering(), args);
>         System.exit(res);
>     }
> }
>
> I am having some issues with the new and old API. Can someone please 
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class 
> FileInputFormat<K,V> cannot be applied to given types;
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>                    ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileInputFormat
>     V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class 
> FileOutputFormat<K,V> cannot be applied to given types;
>     FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>                     ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileOutputFormat
>     V extends Object declared in class FileOutputFormat
> 2 errors


Re: libjar and Mahout

Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk

just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris

On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We 
> are using Apache Pig to build our data pipeline and are planning to 
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath 
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar 
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar 
> myanalytics.SimpleKMeansClustering -libjars 
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar 
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
>             KMeansDriver.run(conf, new 
> Path("/scratch/dummyvector.seq"), new 
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
>                              new Path("/scratch/dummyvectoroutput"), 
> new EuclideanDistanceMeasure(), 0.001, 10,
>                              true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are 
> not made available in the distributed cached. I use Vectors to 
> repreent my data and I write it to a sequence file. I then use that 
> Driver to analyze that in the mapreduce mode. I think locally all the 
> required jar files are available, however somehow in the mapreduce 
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input: 
> /scratch/dummyvector.seq Clusters In: 
> /scratch/dummyvector-initclusters/part-randomSeed Out: 
> /scratch/dummyvectoroutput Distance: 
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max 
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop 
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & 
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for 
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to 
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job: 
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient:  map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id : 
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:264)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
>     at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
>     at 
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
>     at 
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command 
> line option of the `hadoop jar …` command. The jar will be placed in 
> distributed cache 
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache> 
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main 
> class should implement Tool and call ToolRunner.run(). Therefore I 
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
>  public int run(String[] args) throws Exception
>     {
>         //      Configuration conf = new Configuration();
>         Configuration conf = getConf();
>         FileSystem fs = FileSystem.get(conf);
>         Job job = new Job(conf, "SimpleKMeansClustering");
>
>         //to accept the hdfs input and outpur dir at run time
>
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>         FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>
>     SimpleKMeansClustering smkc = new SimpleKMeansClustering();
>         System.out.println ("SimpleKMeansClustering::main -- Wiil call 
> SequenceFile.Writer \n");
>
>     populateData();
>      writePointsToFile("/scratch/dummyvector.seq",fs,conf);
>     readPointsFromFile(fs, conf);
>     runKmeansDriver(conf);
>
>     return job.waitForCompletion(true) ? 0 : 1;
>
>     }
>     public static void main(String args[]) throws Exception {
>
>         int res = ToolRunner.run(new SimpleKMeansClustering(), args);
>         System.exit(res);
>     }
> }
>
> I am having some issues with the new and old API. Can someone please 
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class 
> FileInputFormat<K,V> cannot be applied to given types;
>     FileInputFormat.addInputPath(job, new 
> Path("/scratch/dummyvector.seq"));
>                    ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileInputFormat
>     V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class 
> FileOutputFormat<K,V> cannot be applied to given types;
>     FileOutputFormat.setOutputPath(job, new 
> Path("/scratch/dummyvectoroutput"));
>                     ^
>   required: JobConf,Path
>   found: Job,Path
>   reason: actual argument Job cannot be converted to JobConf by method 
> invocation conversion
>   where K,V are type-variables:
>     K extends Object declared in class FileOutputFormat
>     V extends Object declared in class FileOutputFormat
> 2 errors