You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sebastien Bratieres <sb...@cam.ac.uk> on 2009/05/15 00:56:47 UTC

running Dirichlet example on AEMR

Hi,

Thanks Grant, that did it. I'll figure out later what's going on.

Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I want
to run the Dirichlet example, which I launch with
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
class from the mahout-examples-0.2-SNAPSHOT.job.

This fails with
java.lang.NoClassDefFoundError:
org/apache/mahout/clustering/dirichlet/DirichletJob
    at
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
    at
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

DirichletJob is located in the .job file, inside
lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't find
it.

One difference between kMeans and Dirichlet is
org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
    JobConf conf = new JobConf(Job.class);
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
    JobConf conf = new JobConf(DirichletJob.class);
ie the Dirichlet version uses a job class which is in core, while the kMeans
version uses the currently executing Job class from examples. Is there an
issue with this ?

What should I do to work around this error ? Is the MANIFEST.MF file of the
.job contain a pointer to the /lib directory for the jars there to be
visible by the jar classloader ?

Thanks
Sebastien


2009/5/14 Grant Ingersoll <gs...@apache.org>

> Try running mvn install from the top level dir first.
>
>
> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>
>  Hi,
>>
>> I'd like to walk in the footsteps of Stephen Green running Mahout on EMR.
>>
>> He points out that the fix to issue 118 is needed to do that (I first
>> ran into the file system error too). I'm a first-time Maven user and I
>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>> - highlight mahout-examples project
>> - right-click Run As / Maven package (though I'm not sure at all that
>> Maven package is the right option to use!)
>>
>> but that gives me this error
>> ---
>> [INFO] Scanning for projects...
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Building Mahout examples
>> [INFO]
>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>> [INFO] task-segment: [package]
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] [resources:resources]
>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> [INFO] Copying 0 resource
>> [INFO] [resources:copy-resources]
>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> [INFO] Copying 3 resources
>> [INFO] [compiler:compile]
>> [INFO] Nothing to compile - all classes are up to date
>> [INFO] [resources:testResources]
>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> [INFO] Copying 3 resources
>> [ERROR]
>>
>> Transitive dependency resolution for scope: test has failed for your
>> project.
>>
>>
>>
>> Error message: Missing:
>> ----------
>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>
>>  Try downloading the file manually from the project website.
>>
>>  Then, install it using the command:
>>     mvn install:install-file -DgroupId=org.apache.mahout
>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>> -Dpackaging=test-jar -Dfile=/path/to/file
>>
>>  Alternatively, if you host your own repository you can deploy the file
>> there:
>>     mvn deploy:deploy-file -DgroupId=org.apache.mahout
>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>> -DrepositoryId=[id]
>>
>>  Path to dependency:
>>       1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>       2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>
>> ----------
>> 1 required artifact is missing.
>>
>> for artifact:
>>  org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>
>> from the specified remote repositories:
>>  Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>  maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>  central (http://repo1.maven.org/maven2)
>>
>> Group-Id: org.apache.mahout
>> Artifact-Id: mahout-examples
>> Version: 0.2-SNAPSHOT
>> From file: C:\workspace\mahout\examples\pom.xml
>>
>>
>>
>>
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] For more information, run with the -e flag
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD FAILED
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Total time: 6 seconds
>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>> [INFO] Final Memory: 3M/22M
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> ---
>>
>> So again, my goal is to have a new mahout-examples-1.0.job file or
>> equivalent that contains the patch for 118 and will run on EMR. What
>> is the right way to do this ?
>>
>> Thanks
>> Sebastien
>>
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: running Dirichlet example on AEMR

Posted by Grant Ingersoll <gs...@apache.org>.
We can add a target to the build to do this.  I opened https://issues.apache.org/jira/browse/MAHOUT-119 
  to address this.

Still, there is something about this that makes me wonder if we are  
doing things right, Hadoop-wise.


On May 19, 2009, at 6:49 AM, Sebastien Bratieres wrote:

> Hi all,
>
> I have posted my issue on the AEMR forum and there is some input  
> from there:
> http://developer.amazonwebservices.com/connect/thread.jspa?threadID=32028&tstart=0
> I can't work on this right now but just wanted to keep you updated.
> Specifically it looks like on the Mahout side, we don't need to  
> unpack all
> jars into the top-level .job jar since Hadoop is quite happy with jars
> inside jars (which puzzles me, since the Java classloader can't load  
> these
> -- if someone understands what is going on I'm keen to know).
>
> Sebastien

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: running Dirichlet example on AEMR

Posted by Sean Owen <sr...@gmail.com>.
I could be wrong on this -- but my point was that you cannot put jars
inside jars and then put them together into one classpath. That
mechanism is used to refers to jars outside the jar. If I understand
correctly that you are trying the former, then that explains why it
doesn't work anywhere. And again FWIW it seemed to work for me to
package it all together per above.

On Tue, May 19, 2009 at 11:49 AM, Sebastien Bratieres <sb...@cam.ac.uk> wrote:
> Hi all,
>
> I have posted my issue on the AEMR forum and there is some input from there:
> http://developer.amazonwebservices.com/connect/thread.jspa?threadID=32028&tstart=0
> I can't work on this right now but just wanted to keep you updated.
> Specifically it looks like on the Mahout side, we don't need to unpack all
> jars into the top-level .job jar since Hadoop is quite happy with jars
> inside jars (which puzzles me, since the Java classloader can't load these
> -- if someone understands what is going on I'm keen to know).
>
> Sebastien
>

Re: running Dirichlet example on AEMR

Posted by Sebastien Bratieres <sb...@cam.ac.uk>.
Hi all,

I have posted my issue on the AEMR forum and there is some input from there:
http://developer.amazonwebservices.com/connect/thread.jspa?threadID=32028&tstart=0
I can't work on this right now but just wanted to keep you updated.
Specifically it looks like on the Mahout side, we don't need to unpack all
jars into the top-level .job jar since Hadoop is quite happy with jars
inside jars (which puzzles me, since the Java classloader can't load these
-- if someone understands what is going on I'm keen to know).

Sebastien

Re: running Dirichlet example on AEMR

Posted by Sean Owen <sr...@gmail.com>.
Looks like I deleted or misplaced my hacked-up test, but the core of
the Ant task was about this:

  <jar destfile="out.jar">
    <manifest>
        <attribute name="Main-Class" value="your.job.class.Job"/>
    </manifest>
    <zipfileset dir="classes"/> <!-- include individual .class files -->
    <zipfileset src="one.jar"/> <!-- include a .jar file contents -->
    ...
  </jar>

(This can be done without Ant, of course. And the Ant bit can be
linked into Maven.)

On Tue, May 19, 2009 at 3:37 AM, Jeff Eastman
<jd...@windwardsolutions.com> wrote:
> I reached a similar conclusion. Do you think we should modify the examples
> build to do that? Can you share the ant script you used?

Re: running Dirichlet example on AEMR

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Sean Owen wrote:
> <snip>
> I just went straight to repacking everything into one .jar with an Ant script.
>
> There should not be multiple classloaders in use by Hadoop. Or if
> somehow it does, well your classes are in one .jar and so should all
> end up in the same classloader no matter what. So I suspect something
> along the above lines.
>
>   
I reached a similar conclusion. Do you think we should modify the 
examples build to do that? Can you share the ant script you used?

Jeff

Re: running Dirichlet example on AEMR

Posted by Sean Owen <sr...@gmail.com>.
I've been getting some of my own concoctions running on AEMR (well, at
least, hitting some problems further down the line). Here's how I
proceed.

The input to AEMR is one .jar file with Job, Mapper, Reducer, support
code, etc., yes? So everything has to go in there, for sure. Do I
understand correctly that there are multiple .jars in play here?
packaged up into one big .jar, trying to use the MANIFEST.MF classpath
mechanism? Can you do that -- I did not think this mechanism could
refer to stuff inside its own self.

I just went straight to repacking everything into one .jar with an Ant script.

There should not be multiple classloaders in use by Hadoop. Or if
somehow it does, well your classes are in one .jar and so should all
end up in the same classloader no matter what. So I suspect something
along the above lines.

Re: running Dirichlet example on AEMR

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Note that the following call sequence works (since the error occurs 
later during mapper setup):

syntheticcontrol.dirichlet.Job.main() ->
dirichlet.DirichletDriver.job() ->
dirichlet.DirichletDriver.writeInitialState() ->
dirichlet.DirichletDriver.createState() ->
classLoader.loadClass("NormalScModelDistribution")

but when the mapper tries to do the same call it bombs:

hadoop.util.ReflectionUtils.setConf() ->
dirichlet.DirichletMapper.configure() ->
dirichlet.DirichletMapper.getDirichletState() ->
dirichlet.DirichletDriver.createState() ->
classLoader.loadClass("NormalScModelDistribution")

It looks to me like Hadoop's instantiation of the mapper finds the 
mapper in mahout's lib jar but that instantiation is not happening from 
a context containing the example classloader (and why would it?). Thus, 
the mapper's classloader context does not have the model distribution 
class in it and we get the CNF.

If the mahout jar's classloader does not execute in the context of the 
example job's classloader, then this will frustrate our ability to make 
our libraries extensible. Does anybody see a workaround or a logic flaw 
here?

Jeff

Jeff Eastman wrote:
> Hi Sebastian,
>
> For some reason this was the first post I've seen on this topic. There 
> is something wrong with the Dirichlet jar layout that makes the 
> classloader throw a CNF exception. I noticed this when we were 
> proofing the release and we discussed it on this list without resolution:
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:97) 
>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.configure(DirichletMapper.java:61) 
>
>    at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) 
>
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>
> Is this the same exception you saw before moving the DirichletJob?
>
> I think the problem is that the classloader for the DirichletMapper 
> and other classes, located in the lib, cannot find the
> NormalScModel Distribution, located in the jar. We were seeing a 
> slightly different manifestation earlier, dunno why.
>
> I think trying to use a custom distance measure with kmeans would have 
> a similar result. Moving the Job only postponed the problem to the 
> Mapper.
>
> Jeff
>
> Sebastien Bratieres wrote:
>> Hi Grant,
>>
>> It doesn't look like the CLI has anything to do with my issue -- it's 
>> just a
>> command-line interface to drive the Amazon machines and jobs you run 
>> there
>> remotely. It sends HTTP requests to Amazon to switch machines on and 
>> off,
>> start jobs etc. My issue is linked to the AEMR setup or to something
>> peculiar with classloading and the Dirichlet sample (that's because the
>> kMeans example runs fine).
>> If the kind of issue I'm seeing doesn't ring a bell with you Mahout 
>> guys, I
>> think I'll try with AEMR staff.
>>
>> Thanks
>> Sebastien
>>
>> 2009/5/18 Grant Ingersoll <gs...@apache.org>
>>
>>  
>>> I don't know much about AEMR, so, tell me more about the Ruby CLI 
>>> stuff?
>>>  Does that factor in?
>>>
>>>
>>>
>>> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>>>
>>>  Hi,
>>>    
>>>> I am still trying to make this work. I am running AEMR with the latest
>>>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>>>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>>>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job 
>>>> --main-class
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>>>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>>>> s3n://myBucket/mahout-output/dirichlet --arg
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>>>>
>>>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>>>
>>>> This gave me the class not found error mentioned in my previous email.
>>>>
>>>> I have tried the following: I moved the DirichletJob class from the 
>>>> core
>>>> project into the exampes project, putting it in
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The 
>>>> rationale for
>>>> doing that is that in this way, the classloader does not need to 
>>>> look into
>>>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; 
>>>> instead it
>>>> finds it directly alongside Job.class.
>>>>
>>>> This got me one step further, but an error of the same type stops me
>>>> again:
>>>>
>>>> java.lang.ClassNotFoundException:
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>>>>
>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125) 
>>>>
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71) 
>>>>
>>>>   ... 8 more
>>>>
>>>> This happens on a .loadClass() from the current thread's classloader.
>>>>
>>>> I have tried running this example on my local single-node Hadoop
>>>> installation: this runs fine. The error above occurs only with Amazon
>>>> Elastic MapReduce, and definitely seems related to classloading 
>>>> issues.
>>>>
>>>> Any ideas ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>> 2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>
>>>>
>>>>  Hi,
>>>>      
>>>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>>>
>>>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen 
>>>>> did. I
>>>>> want
>>>>> to run the Dirichlet example, which I launch with
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the 
>>>>> main
>>>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>>>
>>>>> This fails with
>>>>> java.lang.NoClassDefFoundError:
>>>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80) 
>>>>>
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50) 
>>>>>
>>>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>   at
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>
>>>>>   at
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>
>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>>
>>>>> DirichletJob is located in the .job file, inside
>>>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader 
>>>>> can't
>>>>> find
>>>>> it.
>>>>>
>>>>> One difference between kMeans and Dirichlet is
>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>>>   JobConf conf = new JobConf(Job.class);
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>>>   JobConf conf = new JobConf(DirichletJob.class);
>>>>> ie the Dirichlet version uses a job class which is in core, while the
>>>>> kMeans version uses the currently executing Job class from 
>>>>> examples. Is
>>>>> there an issue with this ?
>>>>>
>>>>> What should I do to work around this error ? Is the MANIFEST.MF 
>>>>> file of
>>>>> the
>>>>> .job contain a pointer to the /lib directory for the jars there to be
>>>>> visible by the jar classloader ?
>>>>>
>>>>> Thanks
>>>>> Sebastien
>>>>>
>>>>>
>>>>> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>>>>>
>>>>>  Try running mvn install from the top level dir first.
>>>>>        
>>>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>          
>>>>>>> I'd like to walk in the footsteps of Stephen Green running 
>>>>>>> Mahout on
>>>>>>> EMR.
>>>>>>>
>>>>>>> He points out that the fix to issue 118 is needed to do that (I 
>>>>>>> first
>>>>>>> ran into the file system error too). I'm a first-time Maven user 
>>>>>>> and I
>>>>>>> don't know how to rebuild the mahout-examples-1.0.job file once 
>>>>>>> I have
>>>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>>>> - highlight mahout-examples project
>>>>>>> - right-click Run As / Maven package (though I'm not sure at all 
>>>>>>> that
>>>>>>> Maven package is the right option to use!)
>>>>>>>
>>>>>>> but that gives me this error
>>>>>>> ---
>>>>>>> [INFO] Scanning for projects...
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] Building Mahout examples
>>>>>>> [INFO]
>>>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>> [INFO] task-segment: [package]
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] [resources:resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 0 resource
>>>>>>> [INFO] [resources:copy-resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [INFO] [compiler:compile]
>>>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>>>> [INFO] [resources:testResources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [ERROR]
>>>>>>>
>>>>>>> Transitive dependency resolution for scope: test has failed for 
>>>>>>> your
>>>>>>> project.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error message: Missing:
>>>>>>> ----------
>>>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> Try downloading the file manually from the project website.
>>>>>>>
>>>>>>> Then, install it using the command:
>>>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>>>
>>>>>>> Alternatively, if you host your own repository you can deploy 
>>>>>>> the file
>>>>>>> there:
>>>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>>>> -DrepositoryId=[id]
>>>>>>>
>>>>>>> Path to dependency:
>>>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> ----------
>>>>>>> 1 required artifact is missing.
>>>>>>>
>>>>>>> for artifact:
>>>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>
>>>>>>> from the specified remote repositories:
>>>>>>> Apache snapshots 
>>>>>>> (http://people.apache.org/maven-snapshot-repository),
>>>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>>>> central (http://repo1.maven.org/maven2)
>>>>>>>
>>>>>>> Group-Id: org.apache.mahout
>>>>>>> Artifact-Id: mahout-examples
>>>>>>> Version: 0.2-SNAPSHOT
>>>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] For more information, run with the -e flag
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] BUILD FAILED
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] Total time: 6 seconds
>>>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>>>> [INFO] Final Memory: 3M/22M
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>>>> equivalent that contains the patch for 118 and will run on EMR. 
>>>>>>> What
>>>>>>> is the right way to do this ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sebastien
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem 
>>>>>> (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>     
>>
>>   
>
>
>


Re: running Dirichlet example on AEMR

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Indeed, I can create the same problem in Kmeans by using my own custom 
distance measure:

java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.CustomEuclideanDistanceMeasure
    at org.apache.mahout.clustering.canopy.Canopy.configure(Canopy.java:113)
    at 
org.apache.mahout.clustering.canopy.CanopyMapper.configure(CanopyMapper.java:49)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:328)
    at org.apache.hadoop.mapred.Child.main(Child.java:155)
Caused by: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.kmeans.CustomEuclideanDistanceMeasure
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
    at org.apache.mahout.clustering.canopy.Canopy.configure(Canopy.java:109)
    ... 8 more

This indicates the classloader for the mahout jar in lib does not have 
its parent as the examples job loader. I can run both examples fine in 
Eclipse.

Jeff


Jeff Eastman wrote:
> Hi Sebastian,
>
> For some reason this was the first post I've seen on this topic. There 
> is something wrong with the Dirichlet jar layout that makes the 
> classloader throw a CNF exception. I noticed this when we were 
> proofing the release and we discussed it on this list without resolution:
>
> java.lang.RuntimeException: java.lang.ClassNotFoundException: 
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:97) 
>
>    at 
> org.apache.mahout.clustering.dirichlet.DirichletMapper.configure(DirichletMapper.java:61) 
>
>    at 
> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
>    at 
> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83) 
>
>    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)
>
> Is this the same exception you saw before moving the DirichletJob?
>
> I think the problem is that the classloader for the DirichletMapper 
> and other classes, located in the lib, cannot find the
> NormalScModel Distribution, located in the jar. We were seeing a 
> slightly different manifestation earlier, dunno why.
>
> I think trying to use a custom distance measure with kmeans would have 
> a similar result. Moving the Job only postponed the problem to the 
> Mapper.
>
> Jeff
>
> Sebastien Bratieres wrote:
>> Hi Grant,
>>
>> It doesn't look like the CLI has anything to do with my issue -- it's 
>> just a
>> command-line interface to drive the Amazon machines and jobs you run 
>> there
>> remotely. It sends HTTP requests to Amazon to switch machines on and 
>> off,
>> start jobs etc. My issue is linked to the AEMR setup or to something
>> peculiar with classloading and the Dirichlet sample (that's because the
>> kMeans example runs fine).
>> If the kind of issue I'm seeing doesn't ring a bell with you Mahout 
>> guys, I
>> think I'll try with AEMR staff.
>>
>> Thanks
>> Sebastien
>>
>> 2009/5/18 Grant Ingersoll <gs...@apache.org>
>>
>>  
>>> I don't know much about AEMR, so, tell me more about the Ruby CLI 
>>> stuff?
>>>  Does that factor in?
>>>
>>>
>>>
>>> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>>>
>>>  Hi,
>>>    
>>>> I am still trying to make this work. I am running AEMR with the latest
>>>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>>>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>>>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job 
>>>> --main-class
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>>>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>>>> s3n://myBucket/mahout-output/dirichlet --arg
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>>>>
>>>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>>>
>>>> This gave me the class not found error mentioned in my previous email.
>>>>
>>>> I have tried the following: I moved the DirichletJob class from the 
>>>> core
>>>> project into the exampes project, putting it in
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The 
>>>> rationale for
>>>> doing that is that in this way, the classloader does not need to 
>>>> look into
>>>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; 
>>>> instead it
>>>> finds it directly alongside Job.class.
>>>>
>>>> This got me one step further, but an error of the same type stops me
>>>> again:
>>>>
>>>> java.lang.ClassNotFoundException:
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution 
>>>>
>>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125) 
>>>>
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71) 
>>>>
>>>>   ... 8 more
>>>>
>>>> This happens on a .loadClass() from the current thread's classloader.
>>>>
>>>> I have tried running this example on my local single-node Hadoop
>>>> installation: this runs fine. The error above occurs only with Amazon
>>>> Elastic MapReduce, and definitely seems related to classloading 
>>>> issues.
>>>>
>>>> Any ideas ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>> 2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>
>>>>
>>>>  Hi,
>>>>      
>>>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>>>
>>>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen 
>>>>> did. I
>>>>> want
>>>>> to run the Dirichlet example, which I launch with
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the 
>>>>> main
>>>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>>>
>>>>> This fails with
>>>>> java.lang.NoClassDefFoundError:
>>>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80) 
>>>>>
>>>>>   at
>>>>>
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50) 
>>>>>
>>>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>>   at
>>>>>
>>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) 
>>>>>
>>>>>   at
>>>>>
>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) 
>>>>>
>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>>
>>>>> DirichletJob is located in the .job file, inside
>>>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader 
>>>>> can't
>>>>> find
>>>>> it.
>>>>>
>>>>> One difference between kMeans and Dirichlet is
>>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>>>   JobConf conf = new JobConf(Job.class);
>>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>>>   JobConf conf = new JobConf(DirichletJob.class);
>>>>> ie the Dirichlet version uses a job class which is in core, while the
>>>>> kMeans version uses the currently executing Job class from 
>>>>> examples. Is
>>>>> there an issue with this ?
>>>>>
>>>>> What should I do to work around this error ? Is the MANIFEST.MF 
>>>>> file of
>>>>> the
>>>>> .job contain a pointer to the /lib directory for the jars there to be
>>>>> visible by the jar classloader ?
>>>>>
>>>>> Thanks
>>>>> Sebastien
>>>>>
>>>>>
>>>>> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>>>>>
>>>>>  Try running mvn install from the top level dir first.
>>>>>        
>>>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>>          
>>>>>>> I'd like to walk in the footsteps of Stephen Green running 
>>>>>>> Mahout on
>>>>>>> EMR.
>>>>>>>
>>>>>>> He points out that the fix to issue 118 is needed to do that (I 
>>>>>>> first
>>>>>>> ran into the file system error too). I'm a first-time Maven user 
>>>>>>> and I
>>>>>>> don't know how to rebuild the mahout-examples-1.0.job file once 
>>>>>>> I have
>>>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>>>> - highlight mahout-examples project
>>>>>>> - right-click Run As / Maven package (though I'm not sure at all 
>>>>>>> that
>>>>>>> Maven package is the right option to use!)
>>>>>>>
>>>>>>> but that gives me this error
>>>>>>> ---
>>>>>>> [INFO] Scanning for projects...
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] Building Mahout examples
>>>>>>> [INFO]
>>>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>> [INFO] task-segment: [package]
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] [resources:resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 0 resource
>>>>>>> [INFO] [resources:copy-resources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [INFO] [compiler:compile]
>>>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>>>> [INFO] [resources:testResources]
>>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>>> [INFO] Copying 3 resources
>>>>>>> [ERROR]
>>>>>>>
>>>>>>> Transitive dependency resolution for scope: test has failed for 
>>>>>>> your
>>>>>>> project.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Error message: Missing:
>>>>>>> ----------
>>>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> Try downloading the file manually from the project website.
>>>>>>>
>>>>>>> Then, install it using the command:
>>>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>>>
>>>>>>> Alternatively, if you host your own repository you can deploy 
>>>>>>> the file
>>>>>>> there:
>>>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>>>> -DrepositoryId=[id]
>>>>>>>
>>>>>>> Path to dependency:
>>>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>>
>>>>>>> ----------
>>>>>>> 1 required artifact is missing.
>>>>>>>
>>>>>>> for artifact:
>>>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>>
>>>>>>> from the specified remote repositories:
>>>>>>> Apache snapshots 
>>>>>>> (http://people.apache.org/maven-snapshot-repository),
>>>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>>>> central (http://repo1.maven.org/maven2)
>>>>>>>
>>>>>>> Group-Id: org.apache.mahout
>>>>>>> Artifact-Id: mahout-examples
>>>>>>> Version: 0.2-SNAPSHOT
>>>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] For more information, run with the -e flag
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] BUILD FAILED
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>> [INFO] Total time: 6 seconds
>>>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>>>> [INFO] Final Memory: 3M/22M
>>>>>>> [INFO]
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>
>>>>>>>
>>>>>>> ---
>>>>>>>
>>>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>>>> equivalent that contains the patch for 118 and will run on EMR. 
>>>>>>> What
>>>>>>> is the right way to do this ?
>>>>>>>
>>>>>>> Thanks
>>>>>>> Sebastien
>>>>>>>
>>>>>>>
>>>>>>>             
>>>>>> --------------------------
>>>>>> Grant Ingersoll
>>>>>> http://www.lucidimagination.com/
>>>>>>
>>>>>> Search the Lucene ecosystem 
>>>>>> (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>>>> Solr/Lucene:
>>>>>> http://www.lucidimagination.com/search
>>>>>>
>>>>>>
>>>>>>
>>>>>>           
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) 
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>>     
>>
>>   
>
>
>


Re: running Dirichlet example on AEMR

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Hi Sebastian,

For some reason this was the first post I've seen on this topic. There 
is something wrong with the Dirichlet jar layout that makes the 
classloader throw a CNF exception. I noticed this when we were proofing 
the release and we discussed it on this list without resolution:

java.lang.RuntimeException: java.lang.ClassNotFoundException: 
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
    at 
org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:97)
    at 
org.apache.mahout.clustering.dirichlet.DirichletMapper.configure(DirichletMapper.java:61)
    at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
    at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:83)
    at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34)

Is this the same exception you saw before moving the DirichletJob?

I think the problem is that the classloader for the DirichletMapper and 
other classes, located in the lib, cannot find the
NormalScModel Distribution, located in the jar. We were seeing a 
slightly different manifestation earlier, dunno why.

I think trying to use a custom distance measure with kmeans would have a 
similar result. Moving the Job only postponed the problem to the Mapper.

Jeff

Sebastien Bratieres wrote:
> Hi Grant,
>
> It doesn't look like the CLI has anything to do with my issue -- it's just a
> command-line interface to drive the Amazon machines and jobs you run there
> remotely. It sends HTTP requests to Amazon to switch machines on and off,
> start jobs etc. My issue is linked to the AEMR setup or to something
> peculiar with classloading and the Dirichlet sample (that's because the
> kMeans example runs fine).
> If the kind of issue I'm seeing doesn't ring a bell with you Mahout guys, I
> think I'll try with AEMR staff.
>
> Thanks
> Sebastien
>
> 2009/5/18 Grant Ingersoll <gs...@apache.org>
>
>   
>> I don't know much about AEMR, so, tell me more about the Ruby CLI stuff?
>>  Does that factor in?
>>
>>
>>
>> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>>
>>  Hi,
>>     
>>> I am still trying to make this work. I am running AEMR with the latest
>>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>>> s3n://myBucket/mahout-output/dirichlet --arg
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>>
>>> This gave me the class not found error mentioned in my previous email.
>>>
>>> I have tried the following: I moved the DirichletJob class from the core
>>> project into the exampes project, putting it in
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
>>> doing that is that in this way, the classloader does not need to look into
>>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
>>> finds it directly alongside Job.class.
>>>
>>> This got me one step further, but an error of the same type stops me
>>> again:
>>>
>>> java.lang.ClassNotFoundException:
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>>   at
>>>
>>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>>>   at
>>>
>>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>>>   ... 8 more
>>>
>>> This happens on a .loadClass() from the current thread's classloader.
>>>
>>> I have tried running this example on my local single-node Hadoop
>>> installation: this runs fine. The error above occurs only with Amazon
>>> Elastic MapReduce, and definitely seems related to classloading issues.
>>>
>>> Any ideas ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>> 2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>
>>>
>>>  Hi,
>>>       
>>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>>
>>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I
>>>> want
>>>> to run the Dirichlet example, which I launch with
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
>>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>>
>>>> This fails with
>>>> java.lang.NoClassDefFoundError:
>>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>>>   at
>>>>
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>   at
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>>   at
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>>
>>>> DirichletJob is located in the .job file, inside
>>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't
>>>> find
>>>> it.
>>>>
>>>> One difference between kMeans and Dirichlet is
>>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>>   JobConf conf = new JobConf(Job.class);
>>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>>   JobConf conf = new JobConf(DirichletJob.class);
>>>> ie the Dirichlet version uses a job class which is in core, while the
>>>> kMeans version uses the currently executing Job class from examples. Is
>>>> there an issue with this ?
>>>>
>>>> What should I do to work around this error ? Is the MANIFEST.MF file of
>>>> the
>>>> .job contain a pointer to the /lib directory for the jars there to be
>>>> visible by the jar classloader ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>>
>>>> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>>>>
>>>>  Try running mvn install from the top level dir first.
>>>>         
>>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>           
>>>>>> I'd like to walk in the footsteps of Stephen Green running Mahout on
>>>>>> EMR.
>>>>>>
>>>>>> He points out that the fix to issue 118 is needed to do that (I first
>>>>>> ran into the file system error too). I'm a first-time Maven user and I
>>>>>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>>> - highlight mahout-examples project
>>>>>> - right-click Run As / Maven package (though I'm not sure at all that
>>>>>> Maven package is the right option to use!)
>>>>>>
>>>>>> but that gives me this error
>>>>>> ---
>>>>>> [INFO] Scanning for projects...
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] Building Mahout examples
>>>>>> [INFO]
>>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>> [INFO] task-segment: [package]
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] [resources:resources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 0 resource
>>>>>> [INFO] [resources:copy-resources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 3 resources
>>>>>> [INFO] [compiler:compile]
>>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>>> [INFO] [resources:testResources]
>>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>>> [INFO] Copying 3 resources
>>>>>> [ERROR]
>>>>>>
>>>>>> Transitive dependency resolution for scope: test has failed for your
>>>>>> project.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Error message: Missing:
>>>>>> ----------
>>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>
>>>>>> Try downloading the file manually from the project website.
>>>>>>
>>>>>> Then, install it using the command:
>>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>>
>>>>>> Alternatively, if you host your own repository you can deploy the file
>>>>>> there:
>>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>>> -DrepositoryId=[id]
>>>>>>
>>>>>> Path to dependency:
>>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>>
>>>>>> ----------
>>>>>> 1 required artifact is missing.
>>>>>>
>>>>>> for artifact:
>>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>>
>>>>>> from the specified remote repositories:
>>>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>>> central (http://repo1.maven.org/maven2)
>>>>>>
>>>>>> Group-Id: org.apache.mahout
>>>>>> Artifact-Id: mahout-examples
>>>>>> Version: 0.2-SNAPSHOT
>>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] For more information, run with the -e flag
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] BUILD FAILED
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>> [INFO] Total time: 6 seconds
>>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>>> [INFO] Final Memory: 3M/22M
>>>>>> [INFO]
>>>>>>
>>>>>> ------------------------------------------------------------------------
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>>> equivalent that contains the patch for 118 and will run on EMR. What
>>>>>> is the right way to do this ?
>>>>>>
>>>>>> Thanks
>>>>>> Sebastien
>>>>>>
>>>>>>
>>>>>>             
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://www.lucidimagination.com/
>>>>>
>>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>>> Solr/Lucene:
>>>>> http://www.lucidimagination.com/search
>>>>>
>>>>>
>>>>>
>>>>>           
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>>     
>
>   


Re: running Dirichlet example on AEMR

Posted by Stephen Green <St...@sun.com>.
On May 18, 2009, at 8:59 AM, Sebastien Bratieres wrote:
> Indeed the .job file does contain all the files and classes, and as  
> I wrote,
> the classes which must be loaded actually are there

Sorry, missed that bit. It definitely sounds like a class loading  
problem.

> -- in other words: I
> don't think I'm missing any class. The class not found exception  
> appears
> none the less... as though the classloader couldn't find it when it  
> is in a
> lib/...jar.

Can you post the results of jar -t on the .job file?  When I was  
investigating getting the kmeans example running, it looked like the  
Hadoop infrastructure builds a class path out of the lib directory of  
the .job file (but, annoyingly, doesn't use the class-path element of  
the manifest to figure that out) and I think it makes its own class  
loader like that.

> Also, I can't quite figure out why I could make kMeans work (as you  
> did) and
> not the Dirichlet sample.

Because I'm awesome? :-)  I assume that this is running correctly on a  
local Hadoop?

Steve
-- 
Stephen Green                      //   Stephen.Green@sun.com
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692




Re: running Dirichlet example on AEMR

Posted by Sebastien Bratieres <sb...@cam.ac.uk>.
Hi Steve,

Indeed the .job file does contain all the files and classes, and as I wrote,
the classes which must be loaded actually are there -- in other words: I
don't think I'm missing any class. The class not found exception appears
none the less... as though the classloader couldn't find it when it is in a
lib/...jar.
Also, I can't quite figure out why I could make kMeans work (as you did) and
not the Dirichlet sample.

Thanks
Sebastien


2009/5/18 Stephen Green <St...@sun.com>

>
> On May 18, 2009, at 8:23 AM, Grant Ingersoll wrote:
>
>
>> On May 18, 2009, at 8:12 AM, Sebastien Bratieres wrote:
>>
>>  Hi Grant,
>>>
>>> It doesn't look like the CLI has anything to do with my issue -- it's
>>> just a
>>> command-line interface to drive the Amazon machines and jobs you run
>>> there
>>> remotely. It sends HTTP requests to Amazon to switch machines on and off,
>>> start jobs etc. My issue is linked to the AEMR setup or to something
>>> peculiar with classloading and the Dirichlet sample (that's because the
>>> kMeans example runs fine).
>>> If the kind of issue I'm seeing doesn't ring a bell with you Mahout guys,
>>> I
>>> think I'll try with AEMR staff.
>>>
>>
>> Likely, true.  We're not all up on AEMR just yet (although I think I will
>> be trying it out next week)
>>
>> I'd recommend seeing if you can get a little closer to the bone and run
>> pure Java with as little in between as possible.  It may very well be that
>> we need to create some alternate Job jars for AEMR as well that package all
>> of Mahout into a single jar.
>>
>
> Doesn't the example job do this?  That's all I needed to get EMR up and
> running.  I think a good first cut would be to make sure that everything you
> need is packaged into a single jar file that you can specify in the EMR
> configuration.
>
> I did this a couple of times by building the jar files by hand...
>
> Steve
> --
> Stephen Green                      //   Stephen.Green@sun.com
> Principal Investigator             \\   http://blogs.sun.com/searchguy
> Aura Project                       //   Voice: +1 781-442-0926
> Sun Microsystems Labs              \\   Fax:   +1 781-442-1692
>
>
>
>

Re: running Dirichlet example on AEMR

Posted by Stephen Green <St...@sun.com>.
On May 18, 2009, at 8:23 AM, Grant Ingersoll wrote:

>
> On May 18, 2009, at 8:12 AM, Sebastien Bratieres wrote:
>
>> Hi Grant,
>>
>> It doesn't look like the CLI has anything to do with my issue --  
>> it's just a
>> command-line interface to drive the Amazon machines and jobs you  
>> run there
>> remotely. It sends HTTP requests to Amazon to switch machines on  
>> and off,
>> start jobs etc. My issue is linked to the AEMR setup or to something
>> peculiar with classloading and the Dirichlet sample (that's because  
>> the
>> kMeans example runs fine).
>> If the kind of issue I'm seeing doesn't ring a bell with you Mahout  
>> guys, I
>> think I'll try with AEMR staff.
>
> Likely, true.  We're not all up on AEMR just yet (although I think I  
> will be trying it out next week)
>
> I'd recommend seeing if you can get a little closer to the bone and  
> run pure Java with as little in between as possible.  It may very  
> well be that we need to create some alternate Job jars for AEMR as  
> well that package all of Mahout into a single jar.

Doesn't the example job do this?  That's all I needed to get EMR up  
and running.  I think a good first cut would be to make sure that  
everything you need is packaged into a single jar file that you can  
specify in the EMR configuration.

I did this a couple of times by building the jar files by hand...

Steve
-- 
Stephen Green                      //   Stephen.Green@sun.com
Principal Investigator             \\   http://blogs.sun.com/searchguy
Aura Project                       //   Voice: +1 781-442-0926
Sun Microsystems Labs              \\   Fax:   +1 781-442-1692




Re: running Dirichlet example on AEMR

Posted by Grant Ingersoll <gs...@apache.org>.
On May 18, 2009, at 8:12 AM, Sebastien Bratieres wrote:

> Hi Grant,
>
> It doesn't look like the CLI has anything to do with my issue --  
> it's just a
> command-line interface to drive the Amazon machines and jobs you run  
> there
> remotely. It sends HTTP requests to Amazon to switch machines on and  
> off,
> start jobs etc. My issue is linked to the AEMR setup or to something
> peculiar with classloading and the Dirichlet sample (that's because  
> the
> kMeans example runs fine).
> If the kind of issue I'm seeing doesn't ring a bell with you Mahout  
> guys, I
> think I'll try with AEMR staff.

Likely, true.  We're not all up on AEMR just yet (although I think I  
will be trying it out next week)

I'd recommend seeing if you can get a little closer to the bone and  
run pure Java with as little in between as possible.  It may very well  
be that we need to create some alternate Job jars for AEMR as well  
that package all of Mahout into a single jar.

You might also ask on general@hadoop.a.o.

-Grant

Re: running Dirichlet example on AEMR

Posted by Sebastien Bratieres <sb...@cam.ac.uk>.
Hi Grant,

It doesn't look like the CLI has anything to do with my issue -- it's just a
command-line interface to drive the Amazon machines and jobs you run there
remotely. It sends HTTP requests to Amazon to switch machines on and off,
start jobs etc. My issue is linked to the AEMR setup or to something
peculiar with classloading and the Dirichlet sample (that's because the
kMeans example runs fine).
If the kind of issue I'm seeing doesn't ring a bell with you Mahout guys, I
think I'll try with AEMR staff.

Thanks
Sebastien

2009/5/18 Grant Ingersoll <gs...@apache.org>

> I don't know much about AEMR, so, tell me more about the Ruby CLI stuff?
>  Does that factor in?
>
>
>
> On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:
>
>  Hi,
>>
>> I am still trying to make this work. I am running AEMR with the latest
>> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
>> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
>> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
>> s3n://myBucket/mahout-input/synthetic-control.data --arg
>> s3n://myBucket/mahout-output/dirichlet --arg
>>
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>> --arg 10 --arg 5 --arg 1.0 --arg 1
>>
>> This gave me the class not found error mentioned in my previous email.
>>
>> I have tried the following: I moved the DirichletJob class from the core
>> project into the exampes project, putting it in
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
>> doing that is that in this way, the classloader does not need to look into
>> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
>> finds it directly alongside Job.class.
>>
>> This got me one step further, but an error of the same type stops me
>> again:
>>
>> java.lang.ClassNotFoundException:
>>
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>>   at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>>   at java.security.AccessController.doPrivileged(Native Method)
>>   at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>   at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>>   at
>>
>> org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>>   at
>>
>> org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>>   ... 8 more
>>
>> This happens on a .loadClass() from the current thread's classloader.
>>
>> I have tried running this example on my local single-node Hadoop
>> installation: this runs fine. The error above occurs only with Amazon
>> Elastic MapReduce, and definitely seems related to classloading issues.
>>
>> Any ideas ?
>>
>> Thanks
>> Sebastien
>>
>> 2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>
>>
>>  Hi,
>>>
>>> Thanks Grant, that did it. I'll figure out later what's going on.
>>>
>>> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I
>>> want
>>> to run the Dirichlet example, which I launch with
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
>>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>>
>>> This fails with
>>> java.lang.NoClassDefFoundError:
>>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>>   at
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>>   at
>>>
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>   at
>>>
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>>   at
>>>
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>>   at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>>   at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>>
>>> DirichletJob is located in the .job file, inside
>>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't
>>> find
>>> it.
>>>
>>> One difference between kMeans and Dirichlet is
>>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>>   JobConf conf = new JobConf(Job.class);
>>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>>   JobConf conf = new JobConf(DirichletJob.class);
>>> ie the Dirichlet version uses a job class which is in core, while the
>>> kMeans version uses the currently executing Job class from examples. Is
>>> there an issue with this ?
>>>
>>> What should I do to work around this error ? Is the MANIFEST.MF file of
>>> the
>>> .job contain a pointer to the /lib directory for the jars there to be
>>> visible by the jar classloader ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>>
>>> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>>>
>>>  Try running mvn install from the top level dir first.
>>>>
>>>>
>>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> I'd like to walk in the footsteps of Stephen Green running Mahout on
>>>>> EMR.
>>>>>
>>>>> He points out that the fix to issue 118 is needed to do that (I first
>>>>> ran into the file system error too). I'm a first-time Maven user and I
>>>>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>>> - highlight mahout-examples project
>>>>> - right-click Run As / Maven package (though I'm not sure at all that
>>>>> Maven package is the right option to use!)
>>>>>
>>>>> but that gives me this error
>>>>> ---
>>>>> [INFO] Scanning for projects...
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] Building Mahout examples
>>>>> [INFO]
>>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>> [INFO] task-segment: [package]
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] [resources:resources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 0 resource
>>>>> [INFO] [resources:copy-resources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 3 resources
>>>>> [INFO] [compiler:compile]
>>>>> [INFO] Nothing to compile - all classes are up to date
>>>>> [INFO] [resources:testResources]
>>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>>> [INFO] Copying 3 resources
>>>>> [ERROR]
>>>>>
>>>>> Transitive dependency resolution for scope: test has failed for your
>>>>> project.
>>>>>
>>>>>
>>>>>
>>>>> Error message: Missing:
>>>>> ----------
>>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>
>>>>> Try downloading the file manually from the project website.
>>>>>
>>>>> Then, install it using the command:
>>>>>   mvn install:install-file -DgroupId=org.apache.mahout
>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>>
>>>>> Alternatively, if you host your own repository you can deploy the file
>>>>> there:
>>>>>   mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>>> -DrepositoryId=[id]
>>>>>
>>>>> Path to dependency:
>>>>>     1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>     2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>>
>>>>> ----------
>>>>> 1 required artifact is missing.
>>>>>
>>>>> for artifact:
>>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>>
>>>>> from the specified remote repositories:
>>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>>> central (http://repo1.maven.org/maven2)
>>>>>
>>>>> Group-Id: org.apache.mahout
>>>>> Artifact-Id: mahout-examples
>>>>> Version: 0.2-SNAPSHOT
>>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] For more information, run with the -e flag
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] BUILD FAILED
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>> [INFO] Total time: 6 seconds
>>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>>> [INFO] Final Memory: 3M/22M
>>>>> [INFO]
>>>>>
>>>>> ------------------------------------------------------------------------
>>>>>
>>>>> ---
>>>>>
>>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>>> equivalent that contains the patch for 118 and will run on EMR. What
>>>>> is the right way to do this ?
>>>>>
>>>>> Thanks
>>>>> Sebastien
>>>>>
>>>>>
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://www.lucidimagination.com/
>>>>
>>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>>>> Solr/Lucene:
>>>> http://www.lucidimagination.com/search
>>>>
>>>>
>>>>
>>>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>

Re: running Dirichlet example on AEMR

Posted by Grant Ingersoll <gs...@apache.org>.
I don't know much about AEMR, so, tell me more about the Ruby CLI  
stuff?  Does that factor in?


On May 15, 2009, at 5:03 PM, Sebastien Bratieres wrote:

> Hi,
>
> I am still trying to make this work. I am running AEMR with the latest
> mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
> ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
> s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main- 
> class
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
> s3n://myBucket/mahout-input/synthetic-control.data --arg
> s3n://myBucket/mahout-output/dirichlet --arg
> org 
> .apache 
> .mahout 
> .clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
> --arg 10 --arg 5 --arg 1.0 --arg 1
>
> This gave me the class not found error mentioned in my previous email.
>
> I have tried the following: I moved the DirichletJob class from the  
> core
> project into the exampes project, putting it in
> org.apache.mahout.clustering.syntheticcontrol.dirichlet. The  
> rationale for
> doing that is that in this way, the classloader does not need to  
> look into
> lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class;  
> instead it
> finds it directly alongside Job.class.
>
> This got me one step further, but an error of the same type stops me  
> again:
>
> java.lang.ClassNotFoundException:
> org 
> .apache 
> .mahout 
> .clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
>    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
>    at
> org 
> .apache 
> .mahout 
> .clustering 
> .dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
>    at
> org 
> .apache 
> .mahout 
> .clustering 
> .dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
>    ... 8 more
>
> This happens on a .loadClass() from the current thread's classloader.
>
> I have tried running this example on my local single-node Hadoop
> installation: this runs fine. The error above occurs only with Amazon
> Elastic MapReduce, and definitely seems related to classloading  
> issues.
>
> Any ideas ?
>
> Thanks
> Sebastien
>
> 2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>
>
>> Hi,
>>
>> Thanks Grant, that did it. I'll figure out later what's going on.
>>
>> Now I'm able to run the kMeans example on Amazon EMR as Stephen  
>> did. I want
>> to run the Dirichlet example, which I launch with
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the  
>> main
>> class from the mahout-examples-0.2-SNAPSHOT.job.
>>
>> This fails with
>> java.lang.NoClassDefFoundError:
>> org/apache/mahout/clustering/dirichlet/DirichletJob
>>    at
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>>    at
>> org 
>> .apache 
>> .mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>>    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>    at
>> sun 
>> .reflect 
>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>    at
>> sun 
>> .reflect 
>> .DelegatingMethodAccessorImpl 
>> .invoke(DelegatingMethodAccessorImpl.java:25)
>>    at java.lang.reflect.Method.invoke(Method.java:597)
>>    at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>>    at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>>    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>>    at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>>
>> DirichletJob is located in the .job file, inside
>> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader  
>> can't find
>> it.
>>
>> One difference between kMeans and Dirichlet is
>> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>>    JobConf conf = new JobConf(Job.class);
>> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>>    JobConf conf = new JobConf(DirichletJob.class);
>> ie the Dirichlet version uses a job class which is in core, while the
>> kMeans version uses the currently executing Job class from  
>> examples. Is
>> there an issue with this ?
>>
>> What should I do to work around this error ? Is the MANIFEST.MF  
>> file of the
>> .job contain a pointer to the /lib directory for the jars there to be
>> visible by the jar classloader ?
>>
>> Thanks
>> Sebastien
>>
>>
>> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>>
>>> Try running mvn install from the top level dir first.
>>>
>>>
>>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>>
>>> Hi,
>>>>
>>>> I'd like to walk in the footsteps of Stephen Green running Mahout  
>>>> on EMR.
>>>>
>>>> He points out that the fix to issue 118 is needed to do that (I  
>>>> first
>>>> ran into the file system error too). I'm a first-time Maven user  
>>>> and I
>>>> don't know how to rebuild the mahout-examples-1.0.job file once I  
>>>> have
>>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>>> - highlight mahout-examples project
>>>> - right-click Run As / Maven package (though I'm not sure at all  
>>>> that
>>>> Maven package is the right option to use!)
>>>>
>>>> but that gives me this error
>>>> ---
>>>> [INFO] Scanning for projects...
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] Building Mahout examples
>>>> [INFO]
>>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>> [INFO] task-segment: [package]
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] [resources:resources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 0 resource
>>>> [INFO] [resources:copy-resources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 3 resources
>>>> [INFO] [compiler:compile]
>>>> [INFO] Nothing to compile - all classes are up to date
>>>> [INFO] [resources:testResources]
>>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>>> [INFO] Copying 3 resources
>>>> [ERROR]
>>>>
>>>> Transitive dependency resolution for scope: test has failed for  
>>>> your
>>>> project.
>>>>
>>>>
>>>>
>>>> Error message: Missing:
>>>> ----------
>>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>
>>>> Try downloading the file manually from the project website.
>>>>
>>>> Then, install it using the command:
>>>>    mvn install:install-file -DgroupId=org.apache.mahout
>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>>
>>>> Alternatively, if you host your own repository you can deploy the  
>>>> file
>>>> there:
>>>>    mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>>> -DrepositoryId=[id]
>>>>
>>>> Path to dependency:
>>>>      1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>      2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>>
>>>> ----------
>>>> 1 required artifact is missing.
>>>>
>>>> for artifact:
>>>> org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>>
>>>> from the specified remote repositories:
>>>> Apache snapshots (http://people.apache.org/maven-snapshot-repository 
>>>> ),
>>>> maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>> central (http://repo1.maven.org/maven2)
>>>>
>>>> Group-Id: org.apache.mahout
>>>> Artifact-Id: mahout-examples
>>>> Version: 0.2-SNAPSHOT
>>>> From file: C:\workspace\mahout\examples\pom.xml
>>>>
>>>>
>>>>
>>>>
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] For more information, run with the -e flag
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] BUILD FAILED
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>> [INFO] Total time: 6 seconds
>>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>>> [INFO] Final Memory: 3M/22M
>>>> [INFO]
>>>> ------------------------------------------------------------------------
>>>>
>>>> ---
>>>>
>>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>>> equivalent that contains the patch for 118 and will run on EMR.  
>>>> What
>>>> is the right way to do this ?
>>>>
>>>> Thanks
>>>> Sebastien
>>>>
>>>
>>> --------------------------
>>> Grant Ingersoll
>>> http://www.lucidimagination.com/
>>>
>>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
>>> using
>>> Solr/Lucene:
>>> http://www.lucidimagination.com/search
>>>
>>>
>>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: running Dirichlet example on AEMR

Posted by Sebastien Bratieres <sb...@cam.ac.uk>.
Hi,

I am still trying to make this work. I am running AEMR with the latest
mahout-examples-0.2-SNAPSHOT.job in this way (using the Ruby CLI):
ruby elastic-mapreduce -j j-26RJO9A4WJIJS --jar
s3n://myBucket/mahout-code/mahout-examples-0.2-SNAPSHOT.job --main-class
org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job --arg
s3n://myBucket/mahout-input/synthetic-control.data --arg
s3n://myBucket/mahout-output/dirichlet --arg
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
--arg 10 --arg 5 --arg 1.0 --arg 1

This gave me the class not found error mentioned in my previous email.

I have tried the following: I moved the DirichletJob class from the core
project into the exampes project, putting it in
org.apache.mahout.clustering.syntheticcontrol.dirichlet. The rationale for
doing that is that in this way, the classloader does not need to look into
lib/mahout-core-0.2-SNAPSHOT.jar to obtain DirichletJob.class; instead it
finds it directly alongside Job.class.

This got me one step further, but an error of the same type stops me again:

java.lang.ClassNotFoundException:
org.apache.mahout.clustering.syntheticcontrol.dirichlet.NormalScModelDistribution
    at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:252)
    at
org.apache.mahout.clustering.dirichlet.DirichletDriver.createState(DirichletDriver.java:125)
    at
org.apache.mahout.clustering.dirichlet.DirichletMapper.getDirichletState(DirichletMapper.java:71)
    ... 8 more

This happens on a .loadClass() from the current thread's classloader.

I have tried running this example on my local single-node Hadoop
installation: this runs fine. The error above occurs only with Amazon
Elastic MapReduce, and definitely seems related to classloading issues.

Any ideas ?

Thanks
Sebastien

2009/5/15 Sebastien Bratieres <sb...@cam.ac.uk>

> Hi,
>
> Thanks Grant, that did it. I'll figure out later what's going on.
>
> Now I'm able to run the kMeans example on Amazon EMR as Stephen did. I want
> to run the Dirichlet example, which I launch with
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job as the main
> class from the mahout-examples-0.2-SNAPSHOT.job.
>
> This fails with
> java.lang.NoClassDefFoundError:
> org/apache/mahout/clustering/dirichlet/DirichletJob
>     at
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.runJob(Job.java:80)
>     at
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job.main(Job.java:50)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
>     at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)
>
> DirichletJob is located in the .job file, inside
> lib/mahout-core-0.2-SNAPSHOT.jar. But apparently the classloader can't find
> it.
>
> One difference between kMeans and Dirichlet is
> org.apache.mahout.clustering.syntheticcontrol.kmeans.Job line 74
>     JobConf conf = new JobConf(Job.class);
> org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job line 80
>     JobConf conf = new JobConf(DirichletJob.class);
> ie the Dirichlet version uses a job class which is in core, while the
> kMeans version uses the currently executing Job class from examples. Is
> there an issue with this ?
>
> What should I do to work around this error ? Is the MANIFEST.MF file of the
> .job contain a pointer to the /lib directory for the jars there to be
> visible by the jar classloader ?
>
> Thanks
> Sebastien
>
>
> 2009/5/14 Grant Ingersoll <gs...@apache.org>
>
>> Try running mvn install from the top level dir first.
>>
>>
>> On May 14, 2009, at 11:22 AM, Sebastien Bratieres wrote:
>>
>>  Hi,
>>>
>>> I'd like to walk in the footsteps of Stephen Green running Mahout on EMR.
>>>
>>> He points out that the fix to issue 118 is needed to do that (I first
>>> ran into the file system error too). I'm a first-time Maven user and I
>>> don't know how to rebuild the mahout-examples-1.0.job file once I have
>>> retrieved revision 765769 from SVN (I use Eclipse). I have tried
>>> - highlight mahout-examples project
>>> - right-click Run As / Maven package (though I'm not sure at all that
>>> Maven package is the right option to use!)
>>>
>>> but that gives me this error
>>> ---
>>> [INFO] Scanning for projects...
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Building Mahout examples
>>> [INFO]
>>> [INFO] Id: org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>> [INFO] task-segment: [package]
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] [resources:resources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 0 resource
>>> [INFO] [resources:copy-resources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 3 resources
>>> [INFO] [compiler:compile]
>>> [INFO] Nothing to compile - all classes are up to date
>>> [INFO] [resources:testResources]
>>> [INFO] Using 'UTF-8' encoding to copy filtered resources.
>>> [INFO] Copying 3 resources
>>> [ERROR]
>>>
>>> Transitive dependency resolution for scope: test has failed for your
>>> project.
>>>
>>>
>>>
>>> Error message: Missing:
>>> ----------
>>> 1) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>
>>>  Try downloading the file manually from the project website.
>>>
>>>  Then, install it using the command:
>>>     mvn install:install-file -DgroupId=org.apache.mahout
>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>> -Dpackaging=test-jar -Dfile=/path/to/file
>>>
>>>  Alternatively, if you host your own repository you can deploy the file
>>> there:
>>>     mvn deploy:deploy-file -DgroupId=org.apache.mahout
>>> -DartifactId=mahout-core -Dversion=0.2-SNAPSHOT -Dclassifier=tests
>>> -Dpackaging=test-jar -Dfile=/path/to/file -Durl=[url]
>>> -DrepositoryId=[id]
>>>
>>>  Path to dependency:
>>>       1) org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>       2) org.apache.mahout:mahout-core:test-jar:tests:0.2-SNAPSHOT
>>>
>>> ----------
>>> 1 required artifact is missing.
>>>
>>> for artifact:
>>>  org.apache.mahout:mahout-examples:jar:0.2-SNAPSHOT
>>>
>>> from the specified remote repositories:
>>>  Apache snapshots (http://people.apache.org/maven-snapshot-repository),
>>>  maven2-repository.dev.java.net (http://download.java.net/maven/2),
>>>  central (http://repo1.maven.org/maven2)
>>>
>>> Group-Id: org.apache.mahout
>>> Artifact-Id: mahout-examples
>>> Version: 0.2-SNAPSHOT
>>> From file: C:\workspace\mahout\examples\pom.xml
>>>
>>>
>>>
>>>
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] For more information, run with the -e flag
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] BUILD FAILED
>>> [INFO]
>>> ------------------------------------------------------------------------
>>> [INFO] Total time: 6 seconds
>>> [INFO] Finished at: Thu May 14 16:58:46 CEST 2009
>>> [INFO] Final Memory: 3M/22M
>>> [INFO]
>>> ------------------------------------------------------------------------
>>>
>>> ---
>>>
>>> So again, my goal is to have a new mahout-examples-1.0.job file or
>>> equivalent that contains the patch for 118 and will run on EMR. What
>>> is the right way to do this ?
>>>
>>> Thanks
>>> Sebastien
>>>
>>
>> --------------------------
>> Grant Ingersoll
>> http://www.lucidimagination.com/
>>
>> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
>> Solr/Lucene:
>> http://www.lucidimagination.com/search
>>
>>
>