You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Sameer Tilak <ss...@live.com> on 2013/12/20 20:44:27 UTC
libjar and Mahout
Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We are using Apache Pig to build our data pipeline and are planning to use Apache Mahout for data analysis.
javac -d /apps/analytics/ -classpath .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar SimpleKMeansClustering.java
jar -cf myanalytics.jar myanalytics/
hadoop jar /apps/analytics/myanalytics.jar myanalytics.SimpleKMeansClustering -libjars /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
I have call the following method in my SimpleKMeansClustering class:
KMeansDriver.run(conf, new Path("/scratch/dummyvector.seq"), new
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
new Path("/scratch/dummyvectoroutput"), new EuclideanDistanceMeasure(), 0.001, 10,
true, 1.0, false);
I
unfortunately get the following error, In think somehow the jars are
not made available in the distributed cached. I use Vectors to repreent
my data and I write it to a sequence file. I then use that Driver to
analyze that in the mapreduce mode. I think locally all the required jar
files are available, however somehow in the mapreduce mode they are not
available. Any help with this would be great!
13/12/19 16:59:02
INFO kmeans.KMeansDriver: Input: /scratch/dummyvector.seq Clusters In:
/scratch/dummyvector-initclusters/part-randomSeed Out:
/scratch/dummyvectoroutput Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
13/12/19
16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job: job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id : attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
at org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
The information says that "Include the JAR in the “-libjars” command line option of the `hadoop jar …` command. The jar will be placed in distributed cache and will be made available to all of the job’s task attempts."
For the hadoop command line options and the method 1 to work the main class should implement Tool and call ToolRunner.run(). Therefore I changed the class as follows:
I was getting an error that
public class SimpleKMeansClustering extends Configured implements Tool {
Code....
public int run(String[] args) throws Exception
{
// Configuration conf = new Configuration();
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "SimpleKMeansClustering");
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new Path("/scratch/dummyvector.seq"));
FileOutputFormat.setOutputPath(job, new Path("/scratch/dummyvectoroutput"));
SimpleKMeansClustering smkc = new SimpleKMeansClustering();
System.out.println ("SimpleKMeansClustering::main -- Wiil call SequenceFile.Writer \n");
populateData();
writePointsToFile("/scratch/dummyvector.seq",fs,conf);
readPointsFromFile(fs, conf);
runKmeansDriver(conf);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new SimpleKMeansClustering(), args);
System.exit(res);
}
}
I am having some issues with the new and old API. Can someone please point me in the correct direction?
SimpleKMeansClustering.java:148: error: method addInputPath in class FileInputFormat<K,V> cannot be applied to given types;
FileInputFormat.addInputPath(job, new Path("/scratch/dummyvector.seq"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileInputFormat
V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath in class FileOutputFormat<K,V> cannot be applied to given types;
FileOutputFormat.setOutputPath(job, new Path("/scratch/dummyvectoroutput"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileOutputFormat
V extends Object declared in class FileOutputFormat
2 errors
RE: libjar and Mahout
Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.
Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout
In your hadoop command I see a space in
the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next
year. We are using Apache Pig to build our data pipeline and
are planning to use Apache Mahout for data analysis.
javac -d /apps/analytics/ -classpath
.:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
SimpleKMeansClustering.java
jar -cf myanalytics.jar myanalytics/
hadoop jar /apps/analytics/myanalytics.jar
myanalytics.SimpleKMeansClustering -libjars
/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
I have call the following method in my SimpleKMeansClustering
class:
KMeansDriver.run(conf, new
Path("/scratch/dummyvector.seq"), new
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
new
Path("/scratch/dummyvectoroutput"), new
EuclideanDistanceMeasure(), 0.001, 10,
true, 1.0, false);
I unfortunately get the following error, In think somehow the
jars are not made available in the distributed cached. I use
Vectors to repreent my data and I write it to a sequence file.
I then use that Driver to analyze that in the mapreduce mode.
I think locally all the required jar files are available,
however somehow in the mapreduce mode they are not available.
Any help with this would be great!
13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
/scratch/dummyvector.seq Clusters In:
/scratch/dummyvector-initclusters/part-randomSeed Out:
/scratch/dummyvectoroutput Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
decompressor
13/12/19 16:59:02 WARN mapred.JobClient: Use
GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input
paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job:
job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native
Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
The information says that "Include the JAR in the “-libjars”
command line option of the `hadoop jar …` command. The jar
will be placed in distributed
cache and will be made available to all of the job’s
task attempts."
For the hadoop command line options and the method 1 to work
the main class should implement Tool and call
ToolRunner.run(). Therefore I changed the class as follows:
I was getting an error that
public class SimpleKMeansClustering extends Configured
implements Tool {
Code....
public int run(String[] args) throws Exception
{
// Configuration conf = new Configuration();
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "SimpleKMeansClustering");
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
SimpleKMeansClustering smkc = new
SimpleKMeansClustering();
System.out.println ("SimpleKMeansClustering::main --
Wiil call SequenceFile.Writer \n");
populateData();
writePointsToFile("/scratch/dummyvector.seq",fs,conf);
readPointsFromFile(fs, conf);
runKmeansDriver(conf);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new SimpleKMeansClustering(),
args);
System.exit(res);
}
}
I am having some issues with the new and old API. Can someone
please point me in the correct direction?
SimpleKMeansClustering.java:148: error: method addInputPath in
class FileInputFormat<K,V> cannot be applied to given
types;
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileInputFormat
V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath
in class FileOutputFormat<K,V> cannot be applied to
given types;
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileOutputFormat
V extends Object declared in class FileOutputFormat
2 errors
RE: libjar and Mahout
Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.
Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout
In your hadoop command I see a space in
the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next
year. We are using Apache Pig to build our data pipeline and
are planning to use Apache Mahout for data analysis.
javac -d /apps/analytics/ -classpath
.:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
SimpleKMeansClustering.java
jar -cf myanalytics.jar myanalytics/
hadoop jar /apps/analytics/myanalytics.jar
myanalytics.SimpleKMeansClustering -libjars
/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
I have call the following method in my SimpleKMeansClustering
class:
KMeansDriver.run(conf, new
Path("/scratch/dummyvector.seq"), new
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
new
Path("/scratch/dummyvectoroutput"), new
EuclideanDistanceMeasure(), 0.001, 10,
true, 1.0, false);
I unfortunately get the following error, In think somehow the
jars are not made available in the distributed cached. I use
Vectors to repreent my data and I write it to a sequence file.
I then use that Driver to analyze that in the mapreduce mode.
I think locally all the required jar files are available,
however somehow in the mapreduce mode they are not available.
Any help with this would be great!
13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
/scratch/dummyvector.seq Clusters In:
/scratch/dummyvector-initclusters/part-randomSeed Out:
/scratch/dummyvectoroutput Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
decompressor
13/12/19 16:59:02 WARN mapred.JobClient: Use
GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input
paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job:
job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native
Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
The information says that "Include the JAR in the “-libjars”
command line option of the `hadoop jar …` command. The jar
will be placed in distributed
cache and will be made available to all of the job’s
task attempts."
For the hadoop command line options and the method 1 to work
the main class should implement Tool and call
ToolRunner.run(). Therefore I changed the class as follows:
I was getting an error that
public class SimpleKMeansClustering extends Configured
implements Tool {
Code....
public int run(String[] args) throws Exception
{
// Configuration conf = new Configuration();
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "SimpleKMeansClustering");
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
SimpleKMeansClustering smkc = new
SimpleKMeansClustering();
System.out.println ("SimpleKMeansClustering::main --
Wiil call SequenceFile.Writer \n");
populateData();
writePointsToFile("/scratch/dummyvector.seq",fs,conf);
readPointsFromFile(fs, conf);
runKmeansDriver(conf);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new SimpleKMeansClustering(),
args);
System.exit(res);
}
}
I am having some issues with the new and old API. Can someone
please point me in the correct direction?
SimpleKMeansClustering.java:148: error: method addInputPath in
class FileInputFormat<K,V> cannot be applied to given
types;
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileInputFormat
V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath
in class FileOutputFormat<K,V> cannot be applied to
given types;
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileOutputFormat
V extends Object declared in class FileOutputFormat
2 errors
RE: libjar and Mahout
Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.
Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout
In your hadoop command I see a space in
the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next
year. We are using Apache Pig to build our data pipeline and
are planning to use Apache Mahout for data analysis.
javac -d /apps/analytics/ -classpath
.:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
SimpleKMeansClustering.java
jar -cf myanalytics.jar myanalytics/
hadoop jar /apps/analytics/myanalytics.jar
myanalytics.SimpleKMeansClustering -libjars
/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
I have call the following method in my SimpleKMeansClustering
class:
KMeansDriver.run(conf, new
Path("/scratch/dummyvector.seq"), new
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
new
Path("/scratch/dummyvectoroutput"), new
EuclideanDistanceMeasure(), 0.001, 10,
true, 1.0, false);
I unfortunately get the following error, In think somehow the
jars are not made available in the distributed cached. I use
Vectors to repreent my data and I write it to a sequence file.
I then use that Driver to analyze that in the mapreduce mode.
I think locally all the required jar files are available,
however somehow in the mapreduce mode they are not available.
Any help with this would be great!
13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
/scratch/dummyvector.seq Clusters In:
/scratch/dummyvector-initclusters/part-randomSeed Out:
/scratch/dummyvectoroutput Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
decompressor
13/12/19 16:59:02 WARN mapred.JobClient: Use
GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input
paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job:
job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native
Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
The information says that "Include the JAR in the “-libjars”
command line option of the `hadoop jar …` command. The jar
will be placed in distributed
cache and will be made available to all of the job’s
task attempts."
For the hadoop command line options and the method 1 to work
the main class should implement Tool and call
ToolRunner.run(). Therefore I changed the class as follows:
I was getting an error that
public class SimpleKMeansClustering extends Configured
implements Tool {
Code....
public int run(String[] args) throws Exception
{
// Configuration conf = new Configuration();
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "SimpleKMeansClustering");
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
SimpleKMeansClustering smkc = new
SimpleKMeansClustering();
System.out.println ("SimpleKMeansClustering::main --
Wiil call SequenceFile.Writer \n");
populateData();
writePointsToFile("/scratch/dummyvector.seq",fs,conf);
readPointsFromFile(fs, conf);
runKmeansDriver(conf);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new SimpleKMeansClustering(),
args);
System.exit(res);
}
}
I am having some issues with the new and old API. Can someone
please point me in the correct direction?
SimpleKMeansClustering.java:148: error: method addInputPath in
class FileInputFormat<K,V> cannot be applied to given
types;
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileInputFormat
V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath
in class FileOutputFormat<K,V> cannot be applied to
given types;
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileOutputFormat
V extends Object declared in class FileOutputFormat
2 errors
RE: libjar and Mahout
Posted by Sameer Tilak <ss...@live.com>.
Let me try that today.
Date: Fri, 20 Dec 2013 21:55:44 -0500
From: chris.mawata@gmail.com
To: user@hadoop.apache.org
Subject: Re: libjar and Mahout
In your hadoop command I see a space in
the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
Hi All,
I am running Hadoop 1.0.3 -- probably will upgrade mid-next
year. We are using Apache Pig to build our data pipeline and
are planning to use Apache Mahout for data analysis.
javac -d /apps/analytics/ -classpath
.:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
SimpleKMeansClustering.java
jar -cf myanalytics.jar myanalytics/
hadoop jar /apps/analytics/myanalytics.jar
myanalytics.SimpleKMeansClustering -libjars
/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
/:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
I have call the following method in my SimpleKMeansClustering
class:
KMeansDriver.run(conf, new
Path("/scratch/dummyvector.seq"), new
Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
new
Path("/scratch/dummyvectoroutput"), new
EuclideanDistanceMeasure(), 0.001, 10,
true, 1.0, false);
I unfortunately get the following error, In think somehow the
jars are not made available in the distributed cached. I use
Vectors to repreent my data and I write it to a sequence file.
I then use that Driver to analyze that in the mapreduce mode.
I think locally all the required jar files are available,
however somehow in the mapreduce mode they are not available.
Any help with this would be great!
13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
/scratch/dummyvector.seq Clusters In:
/scratch/dummyvector-initclusters/part-randomSeed Out:
/scratch/dummyvectoroutput Distance:
org.apache.mahout.common.distance.EuclideanDistanceMeasure
13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001
max Iterations: 10
13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the
native-hadoop library
13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded
& initialized native-zlib library
13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new
decompressor
13/12/19 16:59:02 WARN mapred.JobClient: Use
GenericOptionsParser for parsing the arguments. Applications
should implement Tool for the same.
13/12/19 16:59:02 INFO input.FileInputFormat: Total input
paths to process : 1
13/12/19 16:59:03 INFO mapred.JobClient: Running job:
job_201311111627_0310
13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
attempt_201311111627_0310_m_000000_0, Status : FAILED
Error: java.lang.ClassNotFoundException:
org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native
Method)
at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
at
org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
at
org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
at
org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
To resolve this, I came across this article:
http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
The information says that "Include the JAR in the “-libjars”
command line option of the `hadoop jar …` command. The jar
will be placed in distributed
cache and will be made available to all of the job’s
task attempts."
For the hadoop command line options and the method 1 to work
the main class should implement Tool and call
ToolRunner.run(). Therefore I changed the class as follows:
I was getting an error that
public class SimpleKMeansClustering extends Configured
implements Tool {
Code....
public int run(String[] args) throws Exception
{
// Configuration conf = new Configuration();
Configuration conf = getConf();
FileSystem fs = FileSystem.get(conf);
Job job = new Job(conf, "SimpleKMeansClustering");
//to accept the hdfs input and outpur dir at run time
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
SimpleKMeansClustering smkc = new
SimpleKMeansClustering();
System.out.println ("SimpleKMeansClustering::main --
Wiil call SequenceFile.Writer \n");
populateData();
writePointsToFile("/scratch/dummyvector.seq",fs,conf);
readPointsFromFile(fs, conf);
runKmeansDriver(conf);
return job.waitForCompletion(true) ? 0 : 1;
}
public static void main(String args[]) throws Exception {
int res = ToolRunner.run(new SimpleKMeansClustering(),
args);
System.exit(res);
}
}
I am having some issues with the new and old API. Can someone
please point me in the correct direction?
SimpleKMeansClustering.java:148: error: method addInputPath in
class FileInputFormat<K,V> cannot be applied to given
types;
FileInputFormat.addInputPath(job, new
Path("/scratch/dummyvector.seq"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileInputFormat
V extends Object declared in class FileInputFormat
SimpleKMeansClustering.java:149: error: method setOutputPath
in class FileOutputFormat<K,V> cannot be applied to
given types;
FileOutputFormat.setOutputPath(job, new
Path("/scratch/dummyvectoroutput"));
^
required: JobConf,Path
found: Job,Path
reason: actual argument Job cannot be converted to JobConf
by method invocation conversion
where K,V are type-variables:
K extends Object declared in class FileOutputFormat
V extends Object declared in class FileOutputFormat
2 errors
Re: libjar and Mahout
Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We
> are using Apache Pig to build our data pipeline and are planning to
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar
> myanalytics.SimpleKMeansClustering -libjars
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
> KMeansDriver.run(conf, new
> Path("/scratch/dummyvector.seq"), new
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
> new Path("/scratch/dummyvectoroutput"),
> new EuclideanDistanceMeasure(), 0.001, 10,
> true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are
> not made available in the distributed cached. I use Vectors to
> repreent my data and I write it to a sequence file. I then use that
> Driver to analyze that in the mapreduce mode. I think locally all the
> required jar files are available, however somehow in the mapreduce
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
> /scratch/dummyvector.seq Clusters In:
> /scratch/dummyvector-initclusters/part-randomSeed Out:
> /scratch/dummyvectoroutput Distance:
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job:
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> at
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command
> line option of the `hadoop jar …` command. The jar will be placed in
> distributed cache
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache>
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main
> class should implement Tool and call ToolRunner.run(). Therefore I
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
> public int run(String[] args) throws Exception
> {
> // Configuration conf = new Configuration();
> Configuration conf = getConf();
> FileSystem fs = FileSystem.get(conf);
> Job job = new Job(conf, "SimpleKMeansClustering");
>
> //to accept the hdfs input and outpur dir at run time
>
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
>
> SimpleKMeansClustering smkc = new SimpleKMeansClustering();
> System.out.println ("SimpleKMeansClustering::main -- Wiil call
> SequenceFile.Writer \n");
>
> populateData();
> writePointsToFile("/scratch/dummyvector.seq",fs,conf);
> readPointsFromFile(fs, conf);
> runKmeansDriver(conf);
>
> return job.waitForCompletion(true) ? 0 : 1;
>
> }
> public static void main(String args[]) throws Exception {
>
> int res = ToolRunner.run(new SimpleKMeansClustering(), args);
> System.exit(res);
> }
> }
>
> I am having some issues with the new and old API. Can someone please
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class
> FileInputFormat<K,V> cannot be applied to given types;
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileInputFormat
> V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class
> FileOutputFormat<K,V> cannot be applied to given types;
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileOutputFormat
> V extends Object declared in class FileOutputFormat
> 2 errors
Re: libjar and Mahout
Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We
> are using Apache Pig to build our data pipeline and are planning to
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar
> myanalytics.SimpleKMeansClustering -libjars
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
> KMeansDriver.run(conf, new
> Path("/scratch/dummyvector.seq"), new
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
> new Path("/scratch/dummyvectoroutput"),
> new EuclideanDistanceMeasure(), 0.001, 10,
> true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are
> not made available in the distributed cached. I use Vectors to
> repreent my data and I write it to a sequence file. I then use that
> Driver to analyze that in the mapreduce mode. I think locally all the
> required jar files are available, however somehow in the mapreduce
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
> /scratch/dummyvector.seq Clusters In:
> /scratch/dummyvector-initclusters/part-randomSeed Out:
> /scratch/dummyvectoroutput Distance:
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job:
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> at
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command
> line option of the `hadoop jar …` command. The jar will be placed in
> distributed cache
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache>
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main
> class should implement Tool and call ToolRunner.run(). Therefore I
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
> public int run(String[] args) throws Exception
> {
> // Configuration conf = new Configuration();
> Configuration conf = getConf();
> FileSystem fs = FileSystem.get(conf);
> Job job = new Job(conf, "SimpleKMeansClustering");
>
> //to accept the hdfs input and outpur dir at run time
>
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
>
> SimpleKMeansClustering smkc = new SimpleKMeansClustering();
> System.out.println ("SimpleKMeansClustering::main -- Wiil call
> SequenceFile.Writer \n");
>
> populateData();
> writePointsToFile("/scratch/dummyvector.seq",fs,conf);
> readPointsFromFile(fs, conf);
> runKmeansDriver(conf);
>
> return job.waitForCompletion(true) ? 0 : 1;
>
> }
> public static void main(String args[]) throws Exception {
>
> int res = ToolRunner.run(new SimpleKMeansClustering(), args);
> System.exit(res);
> }
> }
>
> I am having some issues with the new and old API. Can someone please
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class
> FileInputFormat<K,V> cannot be applied to given types;
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileInputFormat
> V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class
> FileOutputFormat<K,V> cannot be applied to given types;
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileOutputFormat
> V extends Object declared in class FileOutputFormat
> 2 errors
Re: libjar and Mahout
Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We
> are using Apache Pig to build our data pipeline and are planning to
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar
> myanalytics.SimpleKMeansClustering -libjars
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
> KMeansDriver.run(conf, new
> Path("/scratch/dummyvector.seq"), new
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
> new Path("/scratch/dummyvectoroutput"),
> new EuclideanDistanceMeasure(), 0.001, 10,
> true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are
> not made available in the distributed cached. I use Vectors to
> repreent my data and I write it to a sequence file. I then use that
> Driver to analyze that in the mapreduce mode. I think locally all the
> required jar files are available, however somehow in the mapreduce
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
> /scratch/dummyvector.seq Clusters In:
> /scratch/dummyvector-initclusters/part-randomSeed Out:
> /scratch/dummyvectoroutput Distance:
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job:
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> at
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command
> line option of the `hadoop jar …` command. The jar will be placed in
> distributed cache
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache>
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main
> class should implement Tool and call ToolRunner.run(). Therefore I
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
> public int run(String[] args) throws Exception
> {
> // Configuration conf = new Configuration();
> Configuration conf = getConf();
> FileSystem fs = FileSystem.get(conf);
> Job job = new Job(conf, "SimpleKMeansClustering");
>
> //to accept the hdfs input and outpur dir at run time
>
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
>
> SimpleKMeansClustering smkc = new SimpleKMeansClustering();
> System.out.println ("SimpleKMeansClustering::main -- Wiil call
> SequenceFile.Writer \n");
>
> populateData();
> writePointsToFile("/scratch/dummyvector.seq",fs,conf);
> readPointsFromFile(fs, conf);
> runKmeansDriver(conf);
>
> return job.waitForCompletion(true) ? 0 : 1;
>
> }
> public static void main(String args[]) throws Exception {
>
> int res = ToolRunner.run(new SimpleKMeansClustering(), args);
> System.exit(res);
> }
> }
>
> I am having some issues with the new and old API. Can someone please
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class
> FileInputFormat<K,V> cannot be applied to given types;
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileInputFormat
> V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class
> FileOutputFormat<K,V> cannot be applied to given types;
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileOutputFormat
> V extends Object declared in class FileOutputFormat
> 2 errors
Re: libjar and Mahout
Posted by Chris Mawata <ch...@gmail.com>.
In your hadoop command I see a space in the part
...-core-0.9-SNAPSHOT.jar /:/apps/mahout/trunk
just after .jar
Should it not be
...-core-0.9-SNAPSHOT.jar:/apps/mahout/trunk
Chris
On 12/20/2013 2:44 PM, Sameer Tilak wrote:
> Hi All,
> I am running Hadoop 1.0.3 -- probably will upgrade mid-next year. We
> are using Apache Pig to build our data pipeline and are planning to
> use Apache Mahout for data analysis.
>
> javac -d /apps/analytics/ -classpath
> .:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-core-1.0.3.jar:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar:/users/p529444/software/hadoop-1.0.3/hadoop-tools-1.0.3.jar:/users/p529444/software/hadoop-1.0.3/lib/commons-logging-1.1.1.jar
> SimpleKMeansClustering.java
>
> jar -cf myanalytics.jar myanalytics/
>
>
> hadoop jar /apps/analytics/myanalytics.jar
> myanalytics.SimpleKMeansClustering -libjars
> /apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT.jar
> /:/apps/mahout/trunk/core/target/mahout-core-0.9-SNAPSHOT-job.jar:/apps/mahout/trunk/math/target/mahout-math-0.9-SNAPSHOT.jar
>
> I have call the following method in my SimpleKMeansClustering class:
>
> KMeansDriver.run(conf, new
> Path("/scratch/dummyvector.seq"), new
> Path("/scratch/dummyvector-initclusters/part-randomSeed/"),
> new Path("/scratch/dummyvectoroutput"),
> new EuclideanDistanceMeasure(), 0.001, 10,
> true, 1.0, false);
>
>
> I unfortunately get the following error, In think somehow the jars are
> not made available in the distributed cached. I use Vectors to
> repreent my data and I write it to a sequence file. I then use that
> Driver to analyze that in the mapreduce mode. I think locally all the
> required jar files are available, however somehow in the mapreduce
> mode they are not available. Any help with this would be great!
>
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: Input:
> /scratch/dummyvector.seq Clusters In:
> /scratch/dummyvector-initclusters/part-randomSeed Out:
> /scratch/dummyvectoroutput Distance:
> org.apache.mahout.common.distance.EuclideanDistanceMeasure
> 13/12/19 16:59:02 INFO kmeans.KMeansDriver: convergence: 0.001 max
> Iterations: 10
> 13/12/19 16:59:02 INFO util.NativeCodeLoader: Loaded the native-hadoop
> library
> 13/12/19 16:59:02 INFO zlib.ZlibFactory: Successfully loaded &
> initialized native-zlib library
> 13/12/19 16:59:02 INFO compress.CodecPool: Got brand-new decompressor
> 13/12/19 16:59:02 WARN mapred.JobClient: Use GenericOptionsParser for
> parsing the arguments. Applications should implement Tool for the same.
> 13/12/19 16:59:02 INFO input.FileInputFormat: Total input paths to
> process : 1
> 13/12/19 16:59:03 INFO mapred.JobClient: Running job:
> job_201311111627_0310
> 13/12/19 16:59:04 INFO mapred.JobClient: map 0% reduce 0%
> 13/12/19 16:59:19 INFO mapred.JobClient: Task Id :
> attempt_201311111627_0310_m_000000_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:820)
> at org.apache.hadoop.io.WritableName.getClass(WritableName.java:71)
> at
> org.apache.hadoop.io.SequenceFile$Reader.getValueClass(SequenceFile.java:1671)
> at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1613)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1486)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1475)
> at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1470)
> at
> org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
>
> To resolve this, I came across this article:
> http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/
>
> The information says that "Include the JAR in the “/-libjars/” command
> line option of the `hadoop jar …` command. The jar will be placed in
> distributed cache
> <http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html#DistributedCache>
> and will be made available to all of the job’s task attempts."
>
> For the hadoop command line options and the method 1 to work the main
> class should implement Tool and call ToolRunner.run(). Therefore I
> changed the class as follows:
>
>
> I was getting an error that
>
> public class SimpleKMeansClustering extends Configured implements Tool {
> Code....
>
> public int run(String[] args) throws Exception
> {
> // Configuration conf = new Configuration();
> Configuration conf = getConf();
> FileSystem fs = FileSystem.get(conf);
> Job job = new Job(conf, "SimpleKMeansClustering");
>
> //to accept the hdfs input and outpur dir at run time
>
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
>
> SimpleKMeansClustering smkc = new SimpleKMeansClustering();
> System.out.println ("SimpleKMeansClustering::main -- Wiil call
> SequenceFile.Writer \n");
>
> populateData();
> writePointsToFile("/scratch/dummyvector.seq",fs,conf);
> readPointsFromFile(fs, conf);
> runKmeansDriver(conf);
>
> return job.waitForCompletion(true) ? 0 : 1;
>
> }
> public static void main(String args[]) throws Exception {
>
> int res = ToolRunner.run(new SimpleKMeansClustering(), args);
> System.exit(res);
> }
> }
>
> I am having some issues with the new and old API. Can someone please
> point me in the correct direction?
>
> SimpleKMeansClustering.java:148: error: method addInputPath in class
> FileInputFormat<K,V> cannot be applied to given types;
> FileInputFormat.addInputPath(job, new
> Path("/scratch/dummyvector.seq"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileInputFormat
> V extends Object declared in class FileInputFormat
> SimpleKMeansClustering.java:149: error: method setOutputPath in class
> FileOutputFormat<K,V> cannot be applied to given types;
> FileOutputFormat.setOutputPath(job, new
> Path("/scratch/dummyvectoroutput"));
> ^
> required: JobConf,Path
> found: Job,Path
> reason: actual argument Job cannot be converted to JobConf by method
> invocation conversion
> where K,V are type-variables:
> K extends Object declared in class FileOutputFormat
> V extends Object declared in class FileOutputFormat
> 2 errors