You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by mw <mw...@plista.com> on 2015/01/07 18:13:19 UTC

Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Hello,

the first error was due to a missing property in yarn.xml. However no i 
have a different problem.


i am working on a web application that should execute lda on a external 
yarn cluster.

I am uploading all the relevant sequence files onto the yarn cluter.
This is how it try to remotely execute lda on the cluster.

         try {
             ugi.doAs(new PrivilegedExceptionAction<Void>() {
                 public Void run() throws Exception {
                     Configuration hdoopConf = new Configuration();
                     hdoopConf.set("fs.defaultFS", 
"hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
                     hdoopConf.set("yarn.resourcemanager.hostname", 
"xxx.xxx.xxx.xxx");
                     hdoopConf.set("mapreduce.framework.name", "yarn");
                     hdoopConf.set("mapred.framework.name", "yarn");
                     hdoopConf.set("mapred.job.tracker", "xxx.xxx.xxx.xxx");
                     hdoopConf.set("dfs.permissions.enabled", "false");
                     hdoopConf.set("hadoop.job.ugi", "xx");
hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
                     CVB0Driver driver = new CVB0Driver();
                     try {
                         driver.run(hdoopConf, 
sparseVectorIn.suffix("/matrix"),
                                 topicsOut, k, numTerms, 
doc_topic_smoothening, term_topic_smoothening,
                                 maxIter, iteration_block_size, 
convergenceDelta,
sparseVectorIn.suffix("/dictionary.file-0"), 
topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
                                 seed, testFraction, numTrainThreads, 
numUpdateThreads, maxItersPerDoc,
                                 numReduceTasks, backfillPerplexity);
                     } catch (ClassNotFoundException e) {
                         e.printStackTrace();
                     } catch (InterruptedException e) {
                         e.printStackTrace();
                     }
                     return null;
                 }
             });
         } catch (InterruptedException e) {
             e.printStackTrace();
         }

I am getting the following error message:

Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
	at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
	at 
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
	at 
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
	at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
	at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
	at 
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
	at 
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
	at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
	at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
	at 
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
	at 
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
	at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
	at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
	at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:344)
	at 
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
	at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
	at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
	at 
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
	at 
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
	at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
	at 
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

java.lang.InterruptedException: Failed to complete iteration 1 stage 1
	at 
org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
	at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
     ...

So apparently the job misses some mahout classes. How can i provide the 
required classes to yarn?

Best,

Max

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
strange. legacy still depends on m-math and should include it into job jar.
or did it get that much out of hand after MR deprecation?

On Fri, Jan 9, 2015 at 8:51 AM, mw <mw...@plista.com> wrote:

> I found a solution!
> I had to upload the missing jars onto yarn hdfs and add the following to
> the hadoop Configuration:
>
> hadoopConf.set("tmpjars","/lib/mahout-math-1.0-20150108.
> 230237-316.jar,/lib/commons-cli-2.0-mahout.jar");
>
> Best,
> Max
>
> On 01/09/2015 02:13 PM, mw wrote:
>
>> I looked into the submitted job.jar and i found that the missing
>> class(org.apache.mahout.math.Vector) is not contained.
>>
>>
>> On 01/09/2015 12:57 PM, mw wrote:
>>
>>> I wrote a message to the hadoop list about it. Also i found this
>>> https://issues.apache.org/jira/browse/MAHOUT-1498 ticket.
>>> Could it be a related bug?
>>>
>>> Best,
>>> Max
>>> On 01/08/2015 06:18 PM, Pat Ferrel wrote:
>>>
>>>> That sounds like a Hadoop list question.
>>>>
>>>> All I can say is there is a job.jar in mrlegacy/target with all
>>>> dependencies packaged. This should have everything needed for lda.
>>>>
>>>> On Jan 8, 2015, at 5:50 AM, mw <mw...@plista.com> wrote:
>>>>
>>>> Hello again,
>>>>
>>>> maybe my question was misleading.
>>>> I am asking whether the intended usage is to provide the job with the
>>>> required library’s and sent those together with the job to yarn(if yes how
>>>> can this be done?), or to add the required classes to the classpath of
>>>> every node in the cluster.
>>>> What is the best practice?
>>>>
>>>> Best,
>>>> Max
>>>>
>>>>
>>>> On 01/07/2015 06:13 PM, mw wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> the first error was due to a missing property in yarn.xml. However no
>>>>> i have a different problem.
>>>>>
>>>>>
>>>>> i am working on a web application that should execute lda on a
>>>>> external yarn cluster.
>>>>>
>>>>> I am uploading all the relevant sequence files onto the yarn cluter.
>>>>> This is how it try to remotely execute lda on the cluster.
>>>>>
>>>>>         try {
>>>>>             ugi.doAs(new PrivilegedExceptionAction<Void>() {
>>>>>                 public Void run() throws Exception {
>>>>>                     Configuration hdoopConf = new Configuration();
>>>>>                     hdoopConf.set("fs.defaultFS",
>>>>> "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>>>>> hdoopConf.set("yarn.resourcemanager.hostname", "xxx.xxx.xxx.xxx");
>>>>> hdoopConf.set("mapreduce.framework.name", "yarn");
>>>>>                     hdoopConf.set("mapred.framework.name", "yarn");
>>>>>                     hdoopConf.set("mapred.job.tracker",
>>>>> "xxx.xxx.xxx.xxx");
>>>>>                     hdoopConf.set("dfs.permissions.enabled", "false");
>>>>>                     hdoopConf.set("hadoop.job.ugi", "xx");
>>>>> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020"
>>>>> );
>>>>>                     CVB0Driver driver = new CVB0Driver();
>>>>>                     try {
>>>>>                         driver.run(hdoopConf, sparseVectorIn.suffix("/
>>>>> matrix"),
>>>>>                                 topicsOut, k, numTerms,
>>>>> doc_topic_smoothening, term_topic_smoothening,
>>>>>                                 maxIter, iteration_block_size,
>>>>> convergenceDelta,
>>>>> sparseVectorIn.suffix("/dictionary.file-0"), topicsOut.suffix("/DocumentTopics/"),
>>>>> sparseVectorIn,
>>>>>                                 seed, testFraction, numTrainThreads,
>>>>> numUpdateThreads, maxItersPerDoc,
>>>>>                                 numReduceTasks, backfillPerplexity);
>>>>>                     } catch (ClassNotFoundException e) {
>>>>>                         e.printStackTrace();
>>>>>                     } catch (InterruptedException e) {
>>>>>                         e.printStackTrace();
>>>>>                     }
>>>>>                     return null;
>>>>>                 }
>>>>>             });
>>>>>         } catch (InterruptedException e) {
>>>>>             e.printStackTrace();
>>>>>         }
>>>>>
>>>>> I am getting the following error message:
>>>>>
>>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:344)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
>>>>> Configuration.java:1844)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByName(
>>>>> Configuration.java:1809)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1903)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1929)
>>>>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(
>>>>> JobConf.java:837)
>>>>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>>>
>>>>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(
>>>>> MapTask.java:391)
>>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
>>>>> init>(MapTask.java:675)
>>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>     at org.apache.hadoop.security.UserGroupInformation.doAs(
>>>>> UserGroupInformation.java:1614)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>>
>>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:344)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
>>>>> Configuration.java:1844)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByName(
>>>>> Configuration.java:1809)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1903)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1929)
>>>>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(
>>>>> JobConf.java:837)
>>>>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>>>
>>>>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(
>>>>> MapTask.java:391)
>>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
>>>>> init>(MapTask.java:675)
>>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>     at org.apache.hadoop.security.UserGroupInformation.doAs(
>>>>> UserGroupInformation.java:1614)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>>
>>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:344)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
>>>>> Configuration.java:1844)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByName(
>>>>> Configuration.java:1809)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1903)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1929)
>>>>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(
>>>>> JobConf.java:837)
>>>>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>>>
>>>>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(
>>>>> MapTask.java:391)
>>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
>>>>> init>(MapTask.java:675)
>>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>     at org.apache.hadoop.security.UserGroupInformation.doAs(
>>>>> UserGroupInformation.java:1614)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>>
>>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>>     at java.lang.Class.forName0(Native Method)
>>>>>     at java.lang.Class.forName(Class.java:344)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(
>>>>> Configuration.java:1844)
>>>>>     at org.apache.hadoop.conf.Configuration.getClassByName(
>>>>> Configuration.java:1809)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1903)
>>>>>     at org.apache.hadoop.conf.Configuration.getClass(
>>>>> Configuration.java:1929)
>>>>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(
>>>>> JobConf.java:837)
>>>>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>>>
>>>>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(
>>>>> MapTask.java:391)
>>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<
>>>>> init>(MapTask.java:675)
>>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>>     at org.apache.hadoop.security.UserGroupInformation.doAs(
>>>>> UserGroupInformation.java:1614)
>>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>>
>>>>> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>>>>>     at org.apache.mahout.clustering.lda.cvb.CVB0Driver.
>>>>> runIteration(CVB0Driver.java:502)
>>>>>     at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
>>>>>
>>>>>     ...
>>>>>
>>>>> So apparently the job misses some mahout classes. How can i provide
>>>>> the required classes to yarn?
>>>>>
>>>>> Best,
>>>>>
>>>>> Max
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by mw <mw...@plista.com>.
I found a solution!
I had to upload the missing jars onto yarn hdfs and add the following to 
the hadoop Configuration:

hadoopConf.set("tmpjars","/lib/mahout-math-1.0-20150108.230237-316.jar,/lib/commons-cli-2.0-mahout.jar");

Best,
Max
On 01/09/2015 02:13 PM, mw wrote:
> I looked into the submitted job.jar and i found that the missing 
> class(org.apache.mahout.math.Vector) is not contained.
>
>
> On 01/09/2015 12:57 PM, mw wrote:
>> I wrote a message to the hadoop list about it. Also i found this 
>> https://issues.apache.org/jira/browse/MAHOUT-1498 ticket.
>> Could it be a related bug?
>>
>> Best,
>> Max
>> On 01/08/2015 06:18 PM, Pat Ferrel wrote:
>>> That sounds like a Hadoop list question.
>>>
>>> All I can say is there is a job.jar in mrlegacy/target with all 
>>> dependencies packaged. This should have everything needed for lda.
>>>
>>> On Jan 8, 2015, at 5:50 AM, mw <mw...@plista.com> wrote:
>>>
>>> Hello again,
>>>
>>> maybe my question was misleading.
>>> I am asking whether the intended usage is to provide the job with 
>>> the required library’s and sent those together with the job to 
>>> yarn(if yes how can this be done?), or to add the required classes 
>>> to the classpath of every node in the cluster.
>>> What is the best practice?
>>>
>>> Best,
>>> Max
>>>
>>>
>>> On 01/07/2015 06:13 PM, mw wrote:
>>>> Hello,
>>>>
>>>> the first error was due to a missing property in yarn.xml. However 
>>>> no i have a different problem.
>>>>
>>>>
>>>> i am working on a web application that should execute lda on a 
>>>> external yarn cluster.
>>>>
>>>> I am uploading all the relevant sequence files onto the yarn cluter.
>>>> This is how it try to remotely execute lda on the cluster.
>>>>
>>>>         try {
>>>>             ugi.doAs(new PrivilegedExceptionAction<Void>() {
>>>>                 public Void run() throws Exception {
>>>>                     Configuration hdoopConf = new Configuration();
>>>>                     hdoopConf.set("fs.defaultFS", 
>>>> "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>>>> hdoopConf.set("yarn.resourcemanager.hostname", "xxx.xxx.xxx.xxx");
>>>> hdoopConf.set("mapreduce.framework.name", "yarn");
>>>>                     hdoopConf.set("mapred.framework.name", "yarn");
>>>>                     hdoopConf.set("mapred.job.tracker", 
>>>> "xxx.xxx.xxx.xxx");
>>>>                     hdoopConf.set("dfs.permissions.enabled", "false");
>>>>                     hdoopConf.set("hadoop.job.ugi", "xx");
>>>> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" ); 
>>>>
>>>>                     CVB0Driver driver = new CVB0Driver();
>>>>                     try {
>>>>                         driver.run(hdoopConf, 
>>>> sparseVectorIn.suffix("/matrix"),
>>>>                                 topicsOut, k, numTerms, 
>>>> doc_topic_smoothening, term_topic_smoothening,
>>>>                                 maxIter, iteration_block_size, 
>>>> convergenceDelta,
>>>> sparseVectorIn.suffix("/dictionary.file-0"), 
>>>> topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
>>>>                                 seed, testFraction, 
>>>> numTrainThreads, numUpdateThreads, maxItersPerDoc,
>>>>                                 numReduceTasks, backfillPerplexity);
>>>>                     } catch (ClassNotFoundException e) {
>>>>                         e.printStackTrace();
>>>>                     } catch (InterruptedException e) {
>>>>                         e.printStackTrace();
>>>>                     }
>>>>                     return null;
>>>>                 }
>>>>             });
>>>>         } catch (InterruptedException e) {
>>>>             e.printStackTrace();
>>>>         }
>>>>
>>>> I am getting the following error message:
>>>>
>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:344)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>>     at 
>>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>     at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>
>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:344)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>>     at 
>>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>     at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>
>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:344)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>>     at 
>>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>     at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>
>>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>>     at java.lang.Class.forName0(Native Method)
>>>>     at java.lang.Class.forName(Class.java:344)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>>     at 
>>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>>     at 
>>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983) 
>>>>
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>>     at 
>>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>>     at 
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>>
>>>> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>>>>     at 
>>>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
>>>>     at 
>>>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319) 
>>>>
>>>>     ...
>>>>
>>>> So apparently the job misses some mahout classes. How can i provide 
>>>> the required classes to yarn?
>>>>
>>>> Best,
>>>>
>>>> Max
>>>
>>
>


Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by mw <mw...@plista.com>.
I looked into the submitted job.jar and i found that the missing 
class(org.apache.mahout.math.Vector) is not contained.


On 01/09/2015 12:57 PM, mw wrote:
> I wrote a message to the hadoop list about it. Also i found this 
> https://issues.apache.org/jira/browse/MAHOUT-1498 ticket.
> Could it be a related bug?
>
> Best,
> Max
> On 01/08/2015 06:18 PM, Pat Ferrel wrote:
>> That sounds like a Hadoop list question.
>>
>> All I can say is there is a job.jar in mrlegacy/target with all 
>> dependencies packaged. This should have everything needed for lda.
>>
>> On Jan 8, 2015, at 5:50 AM, mw <mw...@plista.com> wrote:
>>
>> Hello again,
>>
>> maybe my question was misleading.
>> I am asking whether the intended usage is to provide the job with the 
>> required library’s and sent those together with the job to yarn(if 
>> yes how can this be done?), or to add the required classes to the 
>> classpath of every node in the cluster.
>> What is the best practice?
>>
>> Best,
>> Max
>>
>>
>> On 01/07/2015 06:13 PM, mw wrote:
>>> Hello,
>>>
>>> the first error was due to a missing property in yarn.xml. However 
>>> no i have a different problem.
>>>
>>>
>>> i am working on a web application that should execute lda on a 
>>> external yarn cluster.
>>>
>>> I am uploading all the relevant sequence files onto the yarn cluter.
>>> This is how it try to remotely execute lda on the cluster.
>>>
>>>         try {
>>>             ugi.doAs(new PrivilegedExceptionAction<Void>() {
>>>                 public Void run() throws Exception {
>>>                     Configuration hdoopConf = new Configuration();
>>>                     hdoopConf.set("fs.defaultFS", 
>>> "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>>> hdoopConf.set("yarn.resourcemanager.hostname", "xxx.xxx.xxx.xxx");
>>>                     hdoopConf.set("mapreduce.framework.name", "yarn");
>>>                     hdoopConf.set("mapred.framework.name", "yarn");
>>>                     hdoopConf.set("mapred.job.tracker", 
>>> "xxx.xxx.xxx.xxx");
>>>                     hdoopConf.set("dfs.permissions.enabled", "false");
>>>                     hdoopConf.set("hadoop.job.ugi", "xx");
>>> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
>>>                     CVB0Driver driver = new CVB0Driver();
>>>                     try {
>>>                         driver.run(hdoopConf, 
>>> sparseVectorIn.suffix("/matrix"),
>>>                                 topicsOut, k, numTerms, 
>>> doc_topic_smoothening, term_topic_smoothening,
>>>                                 maxIter, iteration_block_size, 
>>> convergenceDelta,
>>> sparseVectorIn.suffix("/dictionary.file-0"), 
>>> topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
>>>                                 seed, testFraction, numTrainThreads, 
>>> numUpdateThreads, maxItersPerDoc,
>>>                                 numReduceTasks, backfillPerplexity);
>>>                     } catch (ClassNotFoundException e) {
>>>                         e.printStackTrace();
>>>                     } catch (InterruptedException e) {
>>>                         e.printStackTrace();
>>>                     }
>>>                     return null;
>>>                 }
>>>             });
>>>         } catch (InterruptedException e) {
>>>             e.printStackTrace();
>>>         }
>>>
>>> I am getting the following error message:
>>>
>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:344)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>     at 
>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>
>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:344)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>     at 
>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>
>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:344)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>     at 
>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>
>>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>     at java.lang.Class.forName0(Native Method)
>>>     at java.lang.Class.forName(Class.java:344)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>>     at 
>>> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>>     at 
>>> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>>     at 
>>> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>>     at 
>>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>>
>>> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>>>     at 
>>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
>>>     at 
>>> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319) 
>>>
>>>     ...
>>>
>>> So apparently the job misses some mahout classes. How can i provide 
>>> the required classes to yarn?
>>>
>>> Best,
>>>
>>> Max
>>
>


Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by mw <mw...@plista.com>.
I wrote a message to the hadoop list about it. Also i found this 
https://issues.apache.org/jira/browse/MAHOUT-1498 ticket.
Could it be a related bug?

Best,
Max
On 01/08/2015 06:18 PM, Pat Ferrel wrote:
> That sounds like a Hadoop list question.
>
> All I can say is there is a job.jar in mrlegacy/target with all dependencies packaged. This should have everything needed for lda.
>
> On Jan 8, 2015, at 5:50 AM, mw <mw...@plista.com> wrote:
>
> Hello again,
>
> maybe my question was misleading.
> I am asking whether the intended usage is to provide the job with the required library’s and sent those together with the job to yarn(if yes how can this be done?), or to add the required classes to the classpath of every node in the cluster.
> What is the best practice?
>
> Best,
> Max
>
>
> On 01/07/2015 06:13 PM, mw wrote:
>> Hello,
>>
>> the first error was due to a missing property in yarn.xml. However no i have a different problem.
>>
>>
>> i am working on a web application that should execute lda on a external yarn cluster.
>>
>> I am uploading all the relevant sequence files onto the yarn cluter.
>> This is how it try to remotely execute lda on the cluster.
>>
>>         try {
>>             ugi.doAs(new PrivilegedExceptionAction<Void>() {
>>                 public Void run() throws Exception {
>>                     Configuration hdoopConf = new Configuration();
>>                     hdoopConf.set("fs.defaultFS", "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>>                     hdoopConf.set("yarn.resourcemanager.hostname", "xxx.xxx.xxx.xxx");
>>                     hdoopConf.set("mapreduce.framework.name", "yarn");
>>                     hdoopConf.set("mapred.framework.name", "yarn");
>>                     hdoopConf.set("mapred.job.tracker", "xxx.xxx.xxx.xxx");
>>                     hdoopConf.set("dfs.permissions.enabled", "false");
>>                     hdoopConf.set("hadoop.job.ugi", "xx");
>> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
>>                     CVB0Driver driver = new CVB0Driver();
>>                     try {
>>                         driver.run(hdoopConf, sparseVectorIn.suffix("/matrix"),
>>                                 topicsOut, k, numTerms, doc_topic_smoothening, term_topic_smoothening,
>>                                 maxIter, iteration_block_size, convergenceDelta,
>> sparseVectorIn.suffix("/dictionary.file-0"), topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
>>                                 seed, testFraction, numTrainThreads, numUpdateThreads, maxItersPerDoc,
>>                                 numReduceTasks, backfillPerplexity);
>>                     } catch (ClassNotFoundException e) {
>>                         e.printStackTrace();
>>                     } catch (InterruptedException e) {
>>                         e.printStackTrace();
>>                     }
>>                     return null;
>>                 }
>>             });
>>         } catch (InterruptedException e) {
>>             e.printStackTrace();
>>         }
>>
>> I am getting the following error message:
>>
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:344)
>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:344)
>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:344)
>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>
>> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>     at java.lang.Class.forName0(Native Method)
>>     at java.lang.Class.forName(Class.java:344)
>>     at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>>     at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>>     at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>>     at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>>     at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>>     at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>>     at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>>     at java.security.AccessController.doPrivileged(Native Method)
>>     at javax.security.auth.Subject.doAs(Subject.java:422)
>>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>
>> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>>     at org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
>>     at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
>>     ...
>>
>> So apparently the job misses some mahout classes. How can i provide the required classes to yarn?
>>
>> Best,
>>
>> Max
>


Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by Pat Ferrel <pa...@occamsmachete.com>.
That sounds like a Hadoop list question. 

All I can say is there is a job.jar in mrlegacy/target with all dependencies packaged. This should have everything needed for lda.

On Jan 8, 2015, at 5:50 AM, mw <mw...@plista.com> wrote:

Hello again,

maybe my question was misleading.
I am asking whether the intended usage is to provide the job with the required library’s and sent those together with the job to yarn(if yes how can this be done?), or to add the required classes to the classpath of every node in the cluster.
What is the best practice?

Best,
Max


On 01/07/2015 06:13 PM, mw wrote:
> Hello,
> 
> the first error was due to a missing property in yarn.xml. However no i have a different problem.
> 
> 
> i am working on a web application that should execute lda on a external yarn cluster.
> 
> I am uploading all the relevant sequence files onto the yarn cluter.
> This is how it try to remotely execute lda on the cluster.
> 
>        try {
>            ugi.doAs(new PrivilegedExceptionAction<Void>() {
>                public Void run() throws Exception {
>                    Configuration hdoopConf = new Configuration();
>                    hdoopConf.set("fs.defaultFS", "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>                    hdoopConf.set("yarn.resourcemanager.hostname", "xxx.xxx.xxx.xxx");
>                    hdoopConf.set("mapreduce.framework.name", "yarn");
>                    hdoopConf.set("mapred.framework.name", "yarn");
>                    hdoopConf.set("mapred.job.tracker", "xxx.xxx.xxx.xxx");
>                    hdoopConf.set("dfs.permissions.enabled", "false");
>                    hdoopConf.set("hadoop.job.ugi", "xx");
> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
>                    CVB0Driver driver = new CVB0Driver();
>                    try {
>                        driver.run(hdoopConf, sparseVectorIn.suffix("/matrix"),
>                                topicsOut, k, numTerms, doc_topic_smoothening, term_topic_smoothening,
>                                maxIter, iteration_block_size, convergenceDelta,
> sparseVectorIn.suffix("/dictionary.file-0"), topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
>                                seed, testFraction, numTrainThreads, numUpdateThreads, maxItersPerDoc,
>                                numReduceTasks, backfillPerplexity);
>                    } catch (ClassNotFoundException e) {
>                        e.printStackTrace();
>                    } catch (InterruptedException e) {
>                        e.printStackTrace();
>                    }
>                    return null;
>                }
>            });
>        } catch (InterruptedException e) {
>            e.printStackTrace();
>        }
> 
> I am getting the following error message:
> 
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:344)
>    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>    at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:344)
>    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>    at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:344)
>    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>    at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>    at java.lang.Class.forName0(Native Method)
>    at java.lang.Class.forName(Class.java:344)
>    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>    at org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>    at org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>    at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>    at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:422)
>    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> 
> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>    at org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
>    at org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
>    ...
> 
> So apparently the job misses some mahout classes. How can i provide the required classes to yarn?
> 
> Best,
> 
> Max 



Re: Using Mahout 1.0-SNAPSHOT with yarn cluster continued

Posted by mw <mw...@plista.com>.
Hello again,

maybe my question was misleading.
I am asking whether the intended usage is to provide the job with the 
required library’s and sent those together with the job to yarn(if yes 
how can this be done?), or to add the required classes to the classpath 
of every node in the cluster.
What is the best practice?

Best,
Max


On 01/07/2015 06:13 PM, mw wrote:
> Hello,
>
> the first error was due to a missing property in yarn.xml. However no 
> i have a different problem.
>
>
> i am working on a web application that should execute lda on a 
> external yarn cluster.
>
> I am uploading all the relevant sequence files onto the yarn cluter.
> This is how it try to remotely execute lda on the cluster.
>
>         try {
>             ugi.doAs(new PrivilegedExceptionAction<Void>() {
>                 public Void run() throws Exception {
>                     Configuration hdoopConf = new Configuration();
>                     hdoopConf.set("fs.defaultFS", 
> "hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
>                     hdoopConf.set("yarn.resourcemanager.hostname", 
> "xxx.xxx.xxx.xxx");
>                     hdoopConf.set("mapreduce.framework.name", "yarn");
>                     hdoopConf.set("mapred.framework.name", "yarn");
>                     hdoopConf.set("mapred.job.tracker", 
> "xxx.xxx.xxx.xxx");
>                     hdoopConf.set("dfs.permissions.enabled", "false");
>                     hdoopConf.set("hadoop.job.ugi", "xx");
> hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
>                     CVB0Driver driver = new CVB0Driver();
>                     try {
>                         driver.run(hdoopConf, 
> sparseVectorIn.suffix("/matrix"),
>                                 topicsOut, k, numTerms, 
> doc_topic_smoothening, term_topic_smoothening,
>                                 maxIter, iteration_block_size, 
> convergenceDelta,
> sparseVectorIn.suffix("/dictionary.file-0"), 
> topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
>                                 seed, testFraction, numTrainThreads, 
> numUpdateThreads, maxItersPerDoc,
>                                 numReduceTasks, backfillPerplexity);
>                     } catch (ClassNotFoundException e) {
>                         e.printStackTrace();
>                     } catch (InterruptedException e) {
>                         e.printStackTrace();
>                     }
>                     return null;
>                 }
>             });
>         } catch (InterruptedException e) {
>             e.printStackTrace();
>         }
>
> I am getting the following error message:
>
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:344)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>     at 
> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>     at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>     at 
> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>     at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:344)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>     at 
> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>     at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>     at 
> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>     at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:344)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>     at 
> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>     at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>     at 
> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>     at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>     at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>     at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>     at java.lang.Class.forName0(Native Method)
>     at java.lang.Class.forName(Class.java:344)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
>     at 
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
>     at 
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
>     at 
> org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
>     at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
>     at 
> org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
>     at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
>     at 
> org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
>     at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:422)
>     at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> java.lang.InterruptedException: Failed to complete iteration 1 stage 1
>     at 
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
>     at 
> org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
>     ...
>
> So apparently the job misses some mahout classes. How can i provide 
> the required classes to yarn?
>
> Best,
>
> Max