You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Kevin Moulart <ke...@gmail.com> on 2014/03/04 14:53:31 UTC

PCA with ssvd leads to StackOverFlowError

Hi,

I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
I always get a StackOverflowError :

Here is my command line :
mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
-pca "true" -U "false" -V "false" -t 3 -ow

I also tried to put "-us true" as mentionned in
https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
the option is not available anymore.

The output of the previous command is :
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
{--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
--computeU=[false], --computeV=[false], --endPhase=[2147483647],
--input=[/user/myUser/Echant100k], --minSplitSize=[-1],
--outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
--oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
--rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
--uHalfSigma=[false], --vHalfSigma=[false]}
Exception in thread "main" java.lang.StackOverflowError
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
 at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
...

I search online and didn't find a solution to my problem.

Can you help me ?

Thanks in advance,

-- 
Kévin Moulart

Re: PCA with ssvd leads to StackOverFlowError

Posted by Suneel Marthi <su...@yahoo.com>.
I have not seen the stackoverflow error, but this code has been fixed since .8 

Sent from my iPhone

> On Mar 4, 2014, at 12:40 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> 
> It doesn't look like -us has been removed. At least i see it on the head of
> the trunk, SSVDCli.java, line 62:
> 
>    addOption("uSigma", "us", "Compute U * Sigma", String.valueOf(false));
> 
> i.e. short version(single dash) -us true, or long version(double-dash)
> --uSigma true. Can you check again with 0.9? thanks.
> 
> 
>> On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
>> 
>> Kevin, thanks for reporting this.
>> 
>> Stack overflow error has not been known to happen to date. But i will take
>> a look. It looks like a bug in the mean computation code, given your stack
>> trace, although it may have been induced by some circumstances specific to
>> your deployment.
>> 
>> What version is it? 0.9?
>> 
>> As for -us, it is not known to have been removed to me. If it were, it
>> happened without my knowledge. I will take a look at the trunk.
>> 
>> -d
>> 
>> 
>> On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart <ke...@gmail.com>wrote:
>> 
>>> Hi,
>>> 
>>> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
>>> columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
>>> and
>>> I always get a StackOverflowError :
>>> 
>>> Here is my command line :
>>> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
>>> 100
>>> -pca "true" -U "false" -V "false" -t 3 -ow
>>> 
>>> I also tried to put "-us true" as mentionned in
>>> 
>>> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
>>> the option is not available anymore.
>>> 
>>> The output of the previous command is :
>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>>> and HADOOP_CONF_DIR=/etc/hadoop/conf
>>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>>> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
>>> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
>>> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
>>> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
>>> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
>>> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
>>> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
>>> --uHalfSigma=[false], --vHalfSigma=[false]}
>>> Exception in thread "main" java.lang.StackOverflowError
>>> at
>>> 
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> at
>>> 
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> at
>>> 
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> ...
>>> 
>>> I search online and didn't find a solution to my problem.
>>> 
>>> Can you help me ?
>>> 
>>> Thanks in advance,
>>> 
>>> --
>>> Kévin Moulart
>> 
>> 

Re: PCA with ssvd leads to StackOverFlowError

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
as for the stack trace, it looks like it doesn't agree with current trunk.
Again, i need to know which version you are running.

But from looking at current trunk, i don't really see how that may be
happening at the moment.


On Tue, Mar 4, 2014 at 9:40 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> It doesn't look like -us has been removed. At least i see it on the head
> of the trunk, SSVDCli.java, line 62:
>
>     addOption("uSigma", "us", "Compute U * Sigma", String.valueOf(false));
>
> i.e. short version(single dash) -us true, or long version(double-dash)
> --uSigma true. Can you check again with 0.9? thanks.
>
>
> On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov <dl...@gmail.com>wrote:
>
>> Kevin, thanks for reporting this.
>>
>> Stack overflow error has not been known to happen to date. But i will
>> take a look. It looks like a bug in the mean computation code, given your
>> stack trace, although it may have been induced by some circumstances
>> specific to your deployment.
>>
>>  What version is it? 0.9?
>>
>> As for -us, it is not known to have been removed to me. If it were, it
>> happened without my knowledge. I will take a look at the trunk.
>>
>> -d
>>
>>
>> On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart <ke...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
>>> columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
>>> and
>>> I always get a StackOverflowError :
>>>
>>> Here is my command line :
>>> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
>>> 100
>>> -pca "true" -U "false" -V "false" -t 3 -ow
>>>
>>> I also tried to put "-us true" as mentionned in
>>>
>>> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
>>> the option is not available anymore.
>>>
>>> The output of the previous command is :
>>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>>> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>>> and HADOOP_CONF_DIR=/etc/hadoop/conf
>>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>>> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
>>> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
>>> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
>>> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
>>> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
>>> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
>>> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
>>> --uHalfSigma=[false], --vHalfSigma=[false]}
>>> Exception in thread "main" java.lang.StackOverflowError
>>> at
>>>
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>>  at
>>>
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> at
>>>
>>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>> ...
>>>
>>> I search online and didn't find a solution to my problem.
>>>
>>> Can you help me ?
>>>
>>> Thanks in advance,
>>>
>>> --
>>> Kévin Moulart
>>>
>>
>>
>

Re: PCA with ssvd leads to StackOverFlowError

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
It doesn't look like -us has been removed. At least i see it on the head of
the trunk, SSVDCli.java, line 62:

    addOption("uSigma", "us", "Compute U * Sigma", String.valueOf(false));

i.e. short version(single dash) -us true, or long version(double-dash)
--uSigma true. Can you check again with 0.9? thanks.


On Tue, Mar 4, 2014 at 9:37 AM, Dmitriy Lyubimov <dl...@gmail.com> wrote:

> Kevin, thanks for reporting this.
>
> Stack overflow error has not been known to happen to date. But i will take
> a look. It looks like a bug in the mean computation code, given your stack
> trace, although it may have been induced by some circumstances specific to
> your deployment.
>
>  What version is it? 0.9?
>
> As for -us, it is not known to have been removed to me. If it were, it
> happened without my knowledge. I will take a look at the trunk.
>
> -d
>
>
> On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart <ke...@gmail.com>wrote:
>
>> Hi,
>>
>> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
>> columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
>> and
>> I always get a StackOverflowError :
>>
>> Here is my command line :
>> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
>> 100
>> -pca "true" -U "false" -V "false" -t 3 -ow
>>
>> I also tried to put "-us true" as mentionned in
>>
>> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
>> the option is not available anymore.
>>
>> The output of the previous command is :
>> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>> and HADOOP_CONF_DIR=/etc/hadoop/conf
>> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
>> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
>> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
>> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
>> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
>> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
>> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
>> --uHalfSigma=[false], --vHalfSigma=[false]}
>> Exception in thread "main" java.lang.StackOverflowError
>> at
>>
>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>>  at
>>
>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>> at
>>
>> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>> ...
>>
>> I search online and didn't find a solution to my problem.
>>
>> Can you help me ?
>>
>> Thanks in advance,
>>
>> --
>> Kévin Moulart
>>
>
>

Re: PCA with ssvd leads to StackOverFlowError

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Kevin, thanks for reporting this.

Stack overflow error has not been known to happen to date. But i will take
a look. It looks like a bug in the mean computation code, given your stack
trace, although it may have been induced by some circumstances specific to
your deployment.

 What version is it? 0.9?

As for -us, it is not known to have been removed to me. If it were, it
happened without my knowledge. I will take a look at the trunk.

-d


On Tue, Mar 4, 2014 at 5:53 AM, Kevin Moulart <ke...@gmail.com>wrote:

> Hi,
>
> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
> I always get a StackOverflowError :
>
> Here is my command line :
> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
> -pca "true" -U "false" -V "false" -t 3 -ow
>
> I also tried to put "-us true" as mentionned in
>
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> the option is not available anymore.
>
> The output of the previous command is :
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> --uHalfSigma=[false], --vHalfSigma=[false]}
> Exception in thread "main" java.lang.StackOverflowError
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>  at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> ...
>
> I search online and didn't find a solution to my problem.
>
> Can you help me ?
>
> Thanks in advance,
>
> --
> Kévin Moulart
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Perfect ! It works like a charm now ! I'll still be testing after lunch,
and let you know if any new problem subsists, but it looks promising !

Thanks you very much !

Kévin Moulart


2014-03-06 19:31 GMT+01:00 Ted Dunning <te...@gmail.com>:

> On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart <kevinmoulart@gmail.com
> >wrote:
>
> > [ERROR]
> >
> >
> /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]
> > cannot find symbol
> >
>
> Replace that line with:
>
>         stack = new ArrayDeque<GroupTree>();
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Ted Dunning <te...@gmail.com>.
On Thu, Mar 6, 2014 at 7:46 AM, Kevin Moulart <ke...@gmail.com>wrote:

> [ERROR]
>
> /home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]
> cannot find symbol
>

Replace that line with:

        stack = new ArrayDeque<GroupTree>();

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Indeed it causes compile errors :
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project mahout-math: Compilation failure
[ERROR]
/home/myCompny/Downloads/mahout9/math/src/main/java/org/apache/mahout/math/stats/GroupTree.java:[171,31]
cannot find symbol
[ERROR] symbol:   method newArrayDeque()
[ERROR] location: class com.google.common.collect.Queues
 So I'll dig in the code to find where to replace and find an equivalent.

Thanks for your help !

Kévin Moulart


2014-03-06 16:36 GMT+01:00 Sean Owen <sr...@gmail.com>:

> If I'm right, then it will cause compile errors, but then, you just
> fix those by replacing some Guava constructs with equivalent Java or
> older Guava code. IIRC it is fairly trivial.
>
> And in fact probably should not use Guava 12+ methods for this reason
> even if compiling against 12+. And in fact I thought someone cleaned
> that up...
>
> On Thu, Mar 6, 2014 at 3:34 PM, Kevin Moulart <ke...@gmail.com>
> wrote:
> > Ok so should I try and recompile and change the guava version to 11.0.2
> in
> > the pom ?
> >
> > Kévin Moulart
> >
> >
> > 2014-03-06 16:26 GMT+01:00 Sean Owen <sr...@gmail.com>:
> >
> >> That's gonna be a Guava version problem. I have seen variants of this
> >> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
> >> get away with using a later version in a project like this, even
> >> though code that executes on Hadoop will use an older Guava than you
> >> compiled against. This is an example of that gotcha. I think it may be
> >> necessary to force Mahout to use 11.0.2 and change this code.
> >>
> >> I am having deja vu like this has come up before too.
> >>
> >>
> >>
> >>
> >>
> >> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart <ke...@gmail.com>
> >> wrote:
> >> > Hi thanks very much it seems to have worked !
> >> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0"
> works
> >> > and I no longer have the error, but then when running tests that used
> to
> >> > work with previous install like trainAdaptativeLogistic and then
> >> > ValidateAdaptativeLogistic, the first works but the second yields an
> >> error :
> >> >
> >> > bin/mahout validateAdaptiveLogistic --input
> >> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> >> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> >> > --confusion.
> >> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> >> > validateAdaptiveLogistic.props found on classpath, will use
> command-line
> >> > arguments only
> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> >> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> >> > at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
> >> >  at
> org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> >> > at
> org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> >> >  at
> org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> >> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> >> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> >> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> >> >  at
> >> >
> >>
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> >> > at
> >> >
> >>
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
> >> >  at
> >> >
> >>
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> >> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >  at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> > at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >  at java.lang.reflect.Method.invoke(Method.java:606)
> >> > at
> >> >
> >>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> >> >  at
> org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> >> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >  at
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> > at java.lang.reflect.Method.invoke(Method.java:606)
> >> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >> >
> >> > I'll try some other tests to see what's working and what's not.
> >> >
> >> >
> >> >
> >> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan <gk...@gmail.com>:
> >> >
> >> >> Kevin,
> >> >>
> >> >>
> >> >> From trunk, can you build mahout for hadoop2 using this command:
> >> >>
> >> >> mvn clean package -DskipTests=true
> >> -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
> >> >>
> >> >>
> >> >> Then can you verify that you have the right hadoop jars with the
> >> following
> >> >> command:
> >> >>
> >> >> find . -name hadoop*.jar
> >> >>
> >> >>
> >> >>
> >> >> Gokhan
> >> >>
> >> >>
> >> >> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <
> kevinmoulart@gmail.com
> >> >> >wrote:
> >> >>
> >> >> > Hi again, and thanks for the enthousiasm !
> >> >> >
> >> >> > I did compile the trunk with the hadoop2 profile and, althoug it
> >> didn't
> >> >> > work at first because of some Canopy tests not passing, when I
> skipped
> >> >> the
> >> >> > tests it compiled and when I tested it afterward it passed.
> >> >> > I used the version I have isntalled, so I just added the line :
> >> >> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> >> >> > To the pom.xml and type :
> >> >> > mvn -DskipTests clean install -Phadoop2
> >> >> > Then :
> >> >> > mvn test
> >> >> >
> >> >> > Then I tried it with these settings :
> >> >> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> >> >> > export
> >> >> >
> >> >> >
> >> >>
> >>
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> >> >> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
> >> >> >
> >> >> > And the command gives me this :
> >> >> > [root@node01 mahout9]# bin/mahout
> >> >> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >> >> > Running on hadoop, using
> >> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> >> >> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> >> >> > MAHOUT-JOB:
> >> >> >
> >> >> >
> >> >>
> >>
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> >> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> >> >> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> >> >> > at
> org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> >> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> >> > at
> >> >> >
> >> >> >
> >> >>
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >> >  at
> >> >> >
> >> >> >
> >> >>
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> >> > at java.lang.reflect.Method.invoke(Method.java:606)
> >> >> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >> >> >
> >> >> > I even tried with :
> >> >> > export HADOOP_HOME=/.../hadoop,
> >> >> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> >> >> > export HADOOP_HOME=/.../hadoop-mapreduce
> >> >> >
> >> >> > And it still gives me the same result.
> >> >> >
> >> >> > And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
> >> >> >  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
> >> >> >
> >> >> > Any idea ?
> >> >> >
> >> >> >
> >> >> >
> >> >> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman <
> >> andrew.musselman@gmail.com
> >> >> >:
> >> >> >
> >> >> > > I mean "balance the risk aversion against the value of new
> features"
> >> >> duh.
> >> >> > >
> >> >> > >
> >> >> > > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> >> >> > > andrew.musselman@gmail.com
> >> >> > > > wrote:
> >> >> > >
> >> >> > > > Yeah, for sure; balancing clients' risk aversion to technical
> >> >> features
> >> >> > is
> >> >> > > > why we often recommend vendor solutions.
> >> >> > > >
> >> >> > > > Having a little button to choose a newer version of a
> component in
> >> >> the
> >> >> > > > Manager UI (even with a confirmation dialog that said "Are you
> >> sure?
> >> >> > Are
> >> >> > > > you crazy?") would be more palatable to some teams than
> installing
> >> >> > > > tarballs, is what I'm getting at.
> >> >> > > >
> >> >> > > >
> >> >> > > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com>
> >> wrote:
> >> >> > > >
> >> >> > > >> You can always install whatever version of anything on your
> >> cluster
> >> >> > > >> that you want. It may or may not work, but often happens to,
> at
> >> >> least
> >> >> > > >> for whatever you need it to do.
> >> >> > > >>
> >> >> > > >> It's just the same as it is without a packaged distribution --
> >> dump
> >> >> > > >> new tarballs and cross your fingers. Nothing is weird or
> >> different
> >> >> > > >> about the setup or layout. That is the "here be dragons"
> >> solution,
> >> >> > > >> already
> >> >> > > >>
> >> >> > > >> You go with support from a packaged distribution when you
> want a
> >> >> "here
> >> >> > > >> be no dragons" solution. Everything else is by definition
> already
> >> >> > > >> something you can and should do yourself outside of a packaged
> >> >> > > >> distribution. And really -- you freely can, and it's not
> hard, if
> >> >> you
> >> >> > > >> know what you are doing.
> >> >> > > >>
> >> >> > > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> >> >> > > >> <an...@gmail.com> wrote:
> >> >> > > >> > Feels like just yesterday :)
> >> >> > > >> >
> >> >> > > >> > Consider this a feature request to have more flexible
> component
> >> >> > > >> versioning,
> >> >> > > >> > even with a caveat/"here be dragons" warning.  I know that
> >> >> > complicates
> >> >> > > >> > things but people do use your releases a long time.  I
> >> personally
> >> >> > > >> wished I
> >> >> > > >> > could upgrade Pig on CDH 4 for new features but there was no
> >> >> simple
> >> >> > > way
> >> >> > > >> on
> >> >> > > >> > a managed cluster.
> >> >> > > >> >
> >> >> > > >> >
> >> >> > > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <
> srowen@gmail.com>
> >> >> > wrote:
> >> >> > > >> >
> >> >> > > >> >> I don't understand this -- CDH always bundles the latest
> >> release.
> >> >> > > >> >>
> >> >> > > >> >> You know that CDH4 was released in July 2012, right? So it
> >> >> included
> >> >> > > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released
> >> about a
> >> >> > > >> >> month after it began beta 2.
> >> >> > > >> >>
> >> >> > > >> >> CDH follows semantic versioning and won't introduce changes
> >> that
> >> >> > are
> >> >> > > >> >> not backwards-compatible in a minor version update. 0.x
> >> releases
> >> >> of
> >> >> > > >> >> Mahout act like major version changes -- not backwards
> >> >> compatible.
> >> >> > So
> >> >> > > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> >> >> > > >> >>
> >> >> > > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
> >> >> > dlieu.7@gmail.com>
> >> >> > > >> >> wrote:
> >> >> > > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <
> srowen@gmail.com
> >> >
> >> >> > > wrote:
> >> >> > > >> >> >
> >> >> > > >> >> >> I don't follow what here makes you say they are "cut
> down"
> >> >> > > releases?
> >> >> > > >> >> >>
> >> >> > > >> >> >
> >> >> > > >> >> > meaning it seems to be pretty much 2 releases behind the
> >> >> > official.
> >> >> > > >> But i
> >> >> > > >> >> > definitely don't follow CDH developments in this
> department,
> >> >> you
> >> >> > > >> seem in
> >> >> > > >> >> a
> >> >> > > >> >> > better position to explain the existing patchlevel there
> so
> >> I
> >> >> > defer
> >> >> > > >> to
> >> >> > > >> >> you
> >> >> > > >> >> > to explain why this patchlevel is not there.
> >> >> > > >> >> >
> >> >> > > >> >> > -d
> >> >> > > >> >>
> >> >> > > >>
> >> >> > > >
> >> >> > > >
> >> >> > >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Kévin Moulart
> >> >> > GSM France : +33 7 81 06 10 10
> >> >> > GSM Belgique : +32 473 85 23 85
> >> >> > Téléphone fixe : +32 2 771 88 45
> >> >> >
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Kévin Moulart
> >> > GSM France : +33 7 81 06 10 10
> >> > GSM Belgique : +32 473 85 23 85
> >> > Téléphone fixe : +32 2 771 88 45
> >>
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
If I'm right, then it will cause compile errors, but then, you just
fix those by replacing some Guava constructs with equivalent Java or
older Guava code. IIRC it is fairly trivial.

And in fact probably should not use Guava 12+ methods for this reason
even if compiling against 12+. And in fact I thought someone cleaned
that up...

On Thu, Mar 6, 2014 at 3:34 PM, Kevin Moulart <ke...@gmail.com> wrote:
> Ok so should I try and recompile and change the guava version to 11.0.2 in
> the pom ?
>
> Kévin Moulart
>
>
> 2014-03-06 16:26 GMT+01:00 Sean Owen <sr...@gmail.com>:
>
>> That's gonna be a Guava version problem. I have seen variants of this
>> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
>> get away with using a later version in a project like this, even
>> though code that executes on Hadoop will use an older Guava than you
>> compiled against. This is an example of that gotcha. I think it may be
>> necessary to force Mahout to use 11.0.2 and change this code.
>>
>> I am having deja vu like this has come up before too.
>>
>>
>>
>>
>>
>> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart <ke...@gmail.com>
>> wrote:
>> > Hi thanks very much it seems to have worked !
>> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
>> > and I no longer have the error, but then when running tests that used to
>> > work with previous install like trainAdaptativeLogistic and then
>> > ValidateAdaptativeLogistic, the first works but the second yields an
>> error :
>> >
>> > bin/mahout validateAdaptiveLogistic --input
>> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
>> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
>> > --confusion.
>> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
>> > validateAdaptiveLogistic.props found on classpath, will use command-line
>> > arguments only
>> > Exception in thread "main" java.lang.NoSuchMethodError:
>> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
>> > at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
>> >  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
>> > at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
>> >  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
>> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
>> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
>> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
>> >  at
>> >
>> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
>> > at
>> >
>> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
>> >  at
>> >
>> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
>> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >  at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> > at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >  at java.lang.reflect.Method.invoke(Method.java:606)
>> > at
>> >
>> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>> >  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >  at
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> >
>> > I'll try some other tests to see what's working and what's not.
>> >
>> >
>> >
>> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan <gk...@gmail.com>:
>> >
>> >> Kevin,
>> >>
>> >>
>> >> From trunk, can you build mahout for hadoop2 using this command:
>> >>
>> >> mvn clean package -DskipTests=true
>> -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>> >>
>> >>
>> >> Then can you verify that you have the right hadoop jars with the
>> following
>> >> command:
>> >>
>> >> find . -name hadoop*.jar
>> >>
>> >>
>> >>
>> >> Gokhan
>> >>
>> >>
>> >> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <kevinmoulart@gmail.com
>> >> >wrote:
>> >>
>> >> > Hi again, and thanks for the enthousiasm !
>> >> >
>> >> > I did compile the trunk with the hadoop2 profile and, althoug it
>> didn't
>> >> > work at first because of some Canopy tests not passing, when I skipped
>> >> the
>> >> > tests it compiled and when I tested it afterward it passed.
>> >> > I used the version I have isntalled, so I just added the line :
>> >> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
>> >> > To the pom.xml and type :
>> >> > mvn -DskipTests clean install -Phadoop2
>> >> > Then :
>> >> > mvn test
>> >> >
>> >> > Then I tried it with these settings :
>> >> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
>> >> > export
>> >> >
>> >> >
>> >>
>> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> >> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
>> >> >
>> >> > And the command gives me this :
>> >> > [root@node01 mahout9]# bin/mahout
>> >> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> >> > Running on hadoop, using
>> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>> >> > and HADOOP_CONF_DIR=/etc/hadoop/conf
>> >> > MAHOUT-JOB:
>> >> >
>> >> >
>> >>
>> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> >> > Exception in thread "main" java.lang.NoSuchMethodError:
>> >> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>> >> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
>> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> >> > at
>> >> >
>> >> >
>> >>
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >> >  at
>> >> >
>> >> >
>> >>
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> >> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> >> >
>> >> > I even tried with :
>> >> > export HADOOP_HOME=/.../hadoop,
>> >> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
>> >> > export HADOOP_HOME=/.../hadoop-mapreduce
>> >> >
>> >> > And it still gives me the same result.
>> >> >
>> >> > And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
>> >> >  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
>> >> >
>> >> > Any idea ?
>> >> >
>> >> >
>> >> >
>> >> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman <
>> andrew.musselman@gmail.com
>> >> >:
>> >> >
>> >> > > I mean "balance the risk aversion against the value of new features"
>> >> duh.
>> >> > >
>> >> > >
>> >> > > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
>> >> > > andrew.musselman@gmail.com
>> >> > > > wrote:
>> >> > >
>> >> > > > Yeah, for sure; balancing clients' risk aversion to technical
>> >> features
>> >> > is
>> >> > > > why we often recommend vendor solutions.
>> >> > > >
>> >> > > > Having a little button to choose a newer version of a component in
>> >> the
>> >> > > > Manager UI (even with a confirmation dialog that said "Are you
>> sure?
>> >> > Are
>> >> > > > you crazy?") would be more palatable to some teams than installing
>> >> > > > tarballs, is what I'm getting at.
>> >> > > >
>> >> > > >
>> >> > > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com>
>> wrote:
>> >> > > >
>> >> > > >> You can always install whatever version of anything on your
>> cluster
>> >> > > >> that you want. It may or may not work, but often happens to, at
>> >> least
>> >> > > >> for whatever you need it to do.
>> >> > > >>
>> >> > > >> It's just the same as it is without a packaged distribution --
>> dump
>> >> > > >> new tarballs and cross your fingers. Nothing is weird or
>> different
>> >> > > >> about the setup or layout. That is the "here be dragons"
>> solution,
>> >> > > >> already
>> >> > > >>
>> >> > > >> You go with support from a packaged distribution when you want a
>> >> "here
>> >> > > >> be no dragons" solution. Everything else is by definition already
>> >> > > >> something you can and should do yourself outside of a packaged
>> >> > > >> distribution. And really -- you freely can, and it's not hard, if
>> >> you
>> >> > > >> know what you are doing.
>> >> > > >>
>> >> > > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
>> >> > > >> <an...@gmail.com> wrote:
>> >> > > >> > Feels like just yesterday :)
>> >> > > >> >
>> >> > > >> > Consider this a feature request to have more flexible component
>> >> > > >> versioning,
>> >> > > >> > even with a caveat/"here be dragons" warning.  I know that
>> >> > complicates
>> >> > > >> > things but people do use your releases a long time.  I
>> personally
>> >> > > >> wished I
>> >> > > >> > could upgrade Pig on CDH 4 for new features but there was no
>> >> simple
>> >> > > way
>> >> > > >> on
>> >> > > >> > a managed cluster.
>> >> > > >> >
>> >> > > >> >
>> >> > > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com>
>> >> > wrote:
>> >> > > >> >
>> >> > > >> >> I don't understand this -- CDH always bundles the latest
>> release.
>> >> > > >> >>
>> >> > > >> >> You know that CDH4 was released in July 2012, right? So it
>> >> included
>> >> > > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released
>> about a
>> >> > > >> >> month after it began beta 2.
>> >> > > >> >>
>> >> > > >> >> CDH follows semantic versioning and won't introduce changes
>> that
>> >> > are
>> >> > > >> >> not backwards-compatible in a minor version update. 0.x
>> releases
>> >> of
>> >> > > >> >> Mahout act like major version changes -- not backwards
>> >> compatible.
>> >> > So
>> >> > > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
>> >> > > >> >>
>> >> > > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
>> >> > dlieu.7@gmail.com>
>> >> > > >> >> wrote:
>> >> > > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <srowen@gmail.com
>> >
>> >> > > wrote:
>> >> > > >> >> >
>> >> > > >> >> >> I don't follow what here makes you say they are "cut down"
>> >> > > releases?
>> >> > > >> >> >>
>> >> > > >> >> >
>> >> > > >> >> > meaning it seems to be pretty much 2 releases behind the
>> >> > official.
>> >> > > >> But i
>> >> > > >> >> > definitely don't follow CDH developments in this department,
>> >> you
>> >> > > >> seem in
>> >> > > >> >> a
>> >> > > >> >> > better position to explain the existing patchlevel there so
>> I
>> >> > defer
>> >> > > >> to
>> >> > > >> >> you
>> >> > > >> >> > to explain why this patchlevel is not there.
>> >> > > >> >> >
>> >> > > >> >> > -d
>> >> > > >> >>
>> >> > > >>
>> >> > > >
>> >> > > >
>> >> > >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Kévin Moulart
>> >> > GSM France : +33 7 81 06 10 10
>> >> > GSM Belgique : +32 473 85 23 85
>> >> > Téléphone fixe : +32 2 771 88 45
>> >> >
>> >>
>> >
>> >
>> >
>> > --
>> > Kévin Moulart
>> > GSM France : +33 7 81 06 10 10
>> > GSM Belgique : +32 473 85 23 85
>> > Téléphone fixe : +32 2 771 88 45
>>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Ok so should I try and recompile and change the guava version to 11.0.2 in
the pom ?

Kévin Moulart


2014-03-06 16:26 GMT+01:00 Sean Owen <sr...@gmail.com>:

> That's gonna be a Guava version problem. I have seen variants of this
> for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
> get away with using a later version in a project like this, even
> though code that executes on Hadoop will use an older Guava than you
> compiled against. This is an example of that gotcha. I think it may be
> necessary to force Mahout to use 11.0.2 and change this code.
>
> I am having deja vu like this has come up before too.
>
>
>
>
>
> On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart <ke...@gmail.com>
> wrote:
> > Hi thanks very much it seems to have worked !
> > Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
> > and I no longer have the error, but then when running tests that used to
> > work with previous install like trainAdaptativeLogistic and then
> > ValidateAdaptativeLogistic, the first works but the second yields an
> error :
> >
> > bin/mahout validateAdaptiveLogistic --input
> > /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> > /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> > --confusion.
> > 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> > validateAdaptiveLogistic.props found on classpath, will use command-line
> > arguments only
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> > at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
> >  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> > at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
> >  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
> >  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> > at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
> >  at
> >
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> > at
> >
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
> >  at
> >
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >  at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >  at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> >
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> >  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >  at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > I'll try some other tests to see what's working and what's not.
> >
> >
> >
> > 2014-03-06 15:58 GMT+01:00 Gokhan Capan <gk...@gmail.com>:
> >
> >> Kevin,
> >>
> >>
> >> From trunk, can you build mahout for hadoop2 using this command:
> >>
> >> mvn clean package -DskipTests=true
> -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
> >>
> >>
> >> Then can you verify that you have the right hadoop jars with the
> following
> >> command:
> >>
> >> find . -name hadoop*.jar
> >>
> >>
> >>
> >> Gokhan
> >>
> >>
> >> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <kevinmoulart@gmail.com
> >> >wrote:
> >>
> >> > Hi again, and thanks for the enthousiasm !
> >> >
> >> > I did compile the trunk with the hadoop2 profile and, althoug it
> didn't
> >> > work at first because of some Canopy tests not passing, when I skipped
> >> the
> >> > tests it compiled and when I tested it afterward it passed.
> >> > I used the version I have isntalled, so I just added the line :
> >> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> >> > To the pom.xml and type :
> >> > mvn -DskipTests clean install -Phadoop2
> >> > Then :
> >> > mvn test
> >> >
> >> > Then I tried it with these settings :
> >> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> >> > export
> >> >
> >> >
> >>
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> >> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
> >> >
> >> > And the command gives me this :
> >> > [root@node01 mahout9]# bin/mahout
> >> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> >> > Running on hadoop, using
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> >> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> >> > MAHOUT-JOB:
> >> >
> >> >
> >>
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> >> > Exception in thread "main" java.lang.NoSuchMethodError:
> >> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> >> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> >> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >> > at
> >> >
> >> >
> >>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >> >  at
> >> >
> >> >
> >>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >> > at java.lang.reflect.Method.invoke(Method.java:606)
> >> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >> >
> >> > I even tried with :
> >> > export HADOOP_HOME=/.../hadoop,
> >> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> >> > export HADOOP_HOME=/.../hadoop-mapreduce
> >> >
> >> > And it still gives me the same result.
> >> >
> >> > And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
> >> >  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
> >> >
> >> > Any idea ?
> >> >
> >> >
> >> >
> >> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman <
> andrew.musselman@gmail.com
> >> >:
> >> >
> >> > > I mean "balance the risk aversion against the value of new features"
> >> duh.
> >> > >
> >> > >
> >> > > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> >> > > andrew.musselman@gmail.com
> >> > > > wrote:
> >> > >
> >> > > > Yeah, for sure; balancing clients' risk aversion to technical
> >> features
> >> > is
> >> > > > why we often recommend vendor solutions.
> >> > > >
> >> > > > Having a little button to choose a newer version of a component in
> >> the
> >> > > > Manager UI (even with a confirmation dialog that said "Are you
> sure?
> >> > Are
> >> > > > you crazy?") would be more palatable to some teams than installing
> >> > > > tarballs, is what I'm getting at.
> >> > > >
> >> > > >
> >> > > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com>
> wrote:
> >> > > >
> >> > > >> You can always install whatever version of anything on your
> cluster
> >> > > >> that you want. It may or may not work, but often happens to, at
> >> least
> >> > > >> for whatever you need it to do.
> >> > > >>
> >> > > >> It's just the same as it is without a packaged distribution --
> dump
> >> > > >> new tarballs and cross your fingers. Nothing is weird or
> different
> >> > > >> about the setup or layout. That is the "here be dragons"
> solution,
> >> > > >> already
> >> > > >>
> >> > > >> You go with support from a packaged distribution when you want a
> >> "here
> >> > > >> be no dragons" solution. Everything else is by definition already
> >> > > >> something you can and should do yourself outside of a packaged
> >> > > >> distribution. And really -- you freely can, and it's not hard, if
> >> you
> >> > > >> know what you are doing.
> >> > > >>
> >> > > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> >> > > >> <an...@gmail.com> wrote:
> >> > > >> > Feels like just yesterday :)
> >> > > >> >
> >> > > >> > Consider this a feature request to have more flexible component
> >> > > >> versioning,
> >> > > >> > even with a caveat/"here be dragons" warning.  I know that
> >> > complicates
> >> > > >> > things but people do use your releases a long time.  I
> personally
> >> > > >> wished I
> >> > > >> > could upgrade Pig on CDH 4 for new features but there was no
> >> simple
> >> > > way
> >> > > >> on
> >> > > >> > a managed cluster.
> >> > > >> >
> >> > > >> >
> >> > > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com>
> >> > wrote:
> >> > > >> >
> >> > > >> >> I don't understand this -- CDH always bundles the latest
> release.
> >> > > >> >>
> >> > > >> >> You know that CDH4 was released in July 2012, right? So it
> >> included
> >> > > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released
> about a
> >> > > >> >> month after it began beta 2.
> >> > > >> >>
> >> > > >> >> CDH follows semantic versioning and won't introduce changes
> that
> >> > are
> >> > > >> >> not backwards-compatible in a minor version update. 0.x
> releases
> >> of
> >> > > >> >> Mahout act like major version changes -- not backwards
> >> compatible.
> >> > So
> >> > > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> >> > > >> >>
> >> > > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
> >> > dlieu.7@gmail.com>
> >> > > >> >> wrote:
> >> > > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <srowen@gmail.com
> >
> >> > > wrote:
> >> > > >> >> >
> >> > > >> >> >> I don't follow what here makes you say they are "cut down"
> >> > > releases?
> >> > > >> >> >>
> >> > > >> >> >
> >> > > >> >> > meaning it seems to be pretty much 2 releases behind the
> >> > official.
> >> > > >> But i
> >> > > >> >> > definitely don't follow CDH developments in this department,
> >> you
> >> > > >> seem in
> >> > > >> >> a
> >> > > >> >> > better position to explain the existing patchlevel there so
> I
> >> > defer
> >> > > >> to
> >> > > >> >> you
> >> > > >> >> > to explain why this patchlevel is not there.
> >> > > >> >> >
> >> > > >> >> > -d
> >> > > >> >>
> >> > > >>
> >> > > >
> >> > > >
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Kévin Moulart
> >> > GSM France : +33 7 81 06 10 10
> >> > GSM Belgique : +32 473 85 23 85
> >> > Téléphone fixe : +32 2 771 88 45
> >> >
> >>
> >
> >
> >
> > --
> > Kévin Moulart
> > GSM France : +33 7 81 06 10 10
> > GSM Belgique : +32 473 85 23 85
> > Téléphone fixe : +32 2 771 88 45
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
That's gonna be a Guava version problem. I have seen variants of this
for a while. Hadoop still uses 11.0.2 even in HEAD and you can often
get away with using a later version in a project like this, even
though code that executes on Hadoop will use an older Guava than you
compiled against. This is an example of that gotcha. I think it may be
necessary to force Mahout to use 11.0.2 and change this code.

I am having deja vu like this has come up before too.





On Thu, Mar 6, 2014 at 3:23 PM, Kevin Moulart <ke...@gmail.com> wrote:
> Hi thanks very much it seems to have worked !
> Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
> and I no longer have the error, but then when running tests that used to
> work with previous install like trainAdaptativeLogistic and then
> ValidateAdaptativeLogistic, the first works but the second yields an error :
>
> bin/mahout validateAdaptiveLogistic --input
> /mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
> /mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
> --confusion.
> 14/03/06 15:53:42 WARN driver.MahoutDriver: No
> validateAdaptiveLogistic.props found on classpath, will use command-line
> arguments only
> Exception in thread "main" java.lang.NoSuchMethodError:
> com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
> at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
>  at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
> at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
>  at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
>  at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
> at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
>  at
> org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
> at
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
>  at
> org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>  at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> I'll try some other tests to see what's working and what's not.
>
>
>
> 2014-03-06 15:58 GMT+01:00 Gokhan Capan <gk...@gmail.com>:
>
>> Kevin,
>>
>>
>> From trunk, can you build mahout for hadoop2 using this command:
>>
>> mvn clean package -DskipTests=true -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>>
>>
>> Then can you verify that you have the right hadoop jars with the following
>> command:
>>
>> find . -name hadoop*.jar
>>
>>
>>
>> Gokhan
>>
>>
>> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <kevinmoulart@gmail.com
>> >wrote:
>>
>> > Hi again, and thanks for the enthousiasm !
>> >
>> > I did compile the trunk with the hadoop2 profile and, althoug it didn't
>> > work at first because of some Canopy tests not passing, when I skipped
>> the
>> > tests it compiled and when I tested it afterward it passed.
>> > I used the version I have isntalled, so I just added the line :
>> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
>> > To the pom.xml and type :
>> > mvn -DskipTests clean install -Phadoop2
>> > Then :
>> > mvn test
>> >
>> > Then I tried it with these settings :
>> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
>> > export
>> >
>> >
>> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
>> >
>> > And the command gives me this :
>> > [root@node01 mahout9]# bin/mahout
>> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>> > and HADOOP_CONF_DIR=/etc/hadoop/conf
>> > MAHOUT-JOB:
>> >
>> >
>> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
>> > Exception in thread "main" java.lang.NoSuchMethodError:
>> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
>> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> > at
>> >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> >  at
>> >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> > at java.lang.reflect.Method.invoke(Method.java:606)
>> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>> >
>> > I even tried with :
>> > export HADOOP_HOME=/.../hadoop,
>> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
>> > export HADOOP_HOME=/.../hadoop-mapreduce
>> >
>> > And it still gives me the same result.
>> >
>> > And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
>> >  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
>> >
>> > Any idea ?
>> >
>> >
>> >
>> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman <andrew.musselman@gmail.com
>> >:
>> >
>> > > I mean "balance the risk aversion against the value of new features"
>> duh.
>> > >
>> > >
>> > > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
>> > > andrew.musselman@gmail.com
>> > > > wrote:
>> > >
>> > > > Yeah, for sure; balancing clients' risk aversion to technical
>> features
>> > is
>> > > > why we often recommend vendor solutions.
>> > > >
>> > > > Having a little button to choose a newer version of a component in
>> the
>> > > > Manager UI (even with a confirmation dialog that said "Are you sure?
>> > Are
>> > > > you crazy?") would be more palatable to some teams than installing
>> > > > tarballs, is what I'm getting at.
>> > > >
>> > > >
>> > > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:
>> > > >
>> > > >> You can always install whatever version of anything on your cluster
>> > > >> that you want. It may or may not work, but often happens to, at
>> least
>> > > >> for whatever you need it to do.
>> > > >>
>> > > >> It's just the same as it is without a packaged distribution -- dump
>> > > >> new tarballs and cross your fingers. Nothing is weird or different
>> > > >> about the setup or layout. That is the "here be dragons" solution,
>> > > >> already
>> > > >>
>> > > >> You go with support from a packaged distribution when you want a
>> "here
>> > > >> be no dragons" solution. Everything else is by definition already
>> > > >> something you can and should do yourself outside of a packaged
>> > > >> distribution. And really -- you freely can, and it's not hard, if
>> you
>> > > >> know what you are doing.
>> > > >>
>> > > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
>> > > >> <an...@gmail.com> wrote:
>> > > >> > Feels like just yesterday :)
>> > > >> >
>> > > >> > Consider this a feature request to have more flexible component
>> > > >> versioning,
>> > > >> > even with a caveat/"here be dragons" warning.  I know that
>> > complicates
>> > > >> > things but people do use your releases a long time.  I personally
>> > > >> wished I
>> > > >> > could upgrade Pig on CDH 4 for new features but there was no
>> simple
>> > > way
>> > > >> on
>> > > >> > a managed cluster.
>> > > >> >
>> > > >> >
>> > > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com>
>> > wrote:
>> > > >> >
>> > > >> >> I don't understand this -- CDH always bundles the latest release.
>> > > >> >>
>> > > >> >> You know that CDH4 was released in July 2012, right? So it
>> included
>> > > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
>> > > >> >> month after it began beta 2.
>> > > >> >>
>> > > >> >> CDH follows semantic versioning and won't introduce changes that
>> > are
>> > > >> >> not backwards-compatible in a minor version update. 0.x releases
>> of
>> > > >> >> Mahout act like major version changes -- not backwards
>> compatible.
>> > So
>> > > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
>> > > >> >>
>> > > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
>> > dlieu.7@gmail.com>
>> > > >> >> wrote:
>> > > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com>
>> > > wrote:
>> > > >> >> >
>> > > >> >> >> I don't follow what here makes you say they are "cut down"
>> > > releases?
>> > > >> >> >>
>> > > >> >> >
>> > > >> >> > meaning it seems to be pretty much 2 releases behind the
>> > official.
>> > > >> But i
>> > > >> >> > definitely don't follow CDH developments in this department,
>> you
>> > > >> seem in
>> > > >> >> a
>> > > >> >> > better position to explain the existing patchlevel there so I
>> > defer
>> > > >> to
>> > > >> >> you
>> > > >> >> > to explain why this patchlevel is not there.
>> > > >> >> >
>> > > >> >> > -d
>> > > >> >>
>> > > >>
>> > > >
>> > > >
>> > >
>> >
>> >
>> >
>> > --
>> > Kévin Moulart
>> > GSM France : +33 7 81 06 10 10
>> > GSM Belgique : +32 473 85 23 85
>> > Téléphone fixe : +32 2 771 88 45
>> >
>>
>
>
>
> --
> Kévin Moulart
> GSM France : +33 7 81 06 10 10
> GSM Belgique : +32 473 85 23 85
> Téléphone fixe : +32 2 771 88 45

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Hi thanks very much it seems to have worked !
Compiling with "mvn clean package -Dhadoop2.version=2.0.0-cdh4.6.0" works
and I no longer have the error, but then when running tests that used to
work with previous install like trainAdaptativeLogistic and then
ValidateAdaptativeLogistic, the first works but the second yields an error :

bin/mahout validateAdaptiveLogistic --input
/mnt/hdfs/user/myCompany/Echant/echant300k_wh.csv --model
/mnt/hdfs/user/myCompany/Echant/Models/echnat.model --auc --scores
--confusion.
14/03/06 15:53:42 WARN driver.MahoutDriver: No
validateAdaptiveLogistic.props found on classpath, will use command-line
arguments only
Exception in thread "main" java.lang.NoSuchMethodError:
com.google.common.collect.Queues.newArrayDeque()Ljava/util/ArrayDeque;
at org.apache.mahout.math.stats.GroupTree$1.<init>(GroupTree.java:171)
 at org.apache.mahout.math.stats.GroupTree.iterator(GroupTree.java:169)
at org.apache.mahout.math.stats.GroupTree.access$300(GroupTree.java:14)
 at org.apache.mahout.math.stats.GroupTree$2.iterator(GroupTree.java:317)
at org.apache.mahout.math.stats.TDigest.add(TDigest.java:105)
 at org.apache.mahout.math.stats.TDigest.add(TDigest.java:88)
at org.apache.mahout.math.stats.TDigest.add(TDigest.java:76)
 at
org.apache.mahout.math.stats.OnlineSummarizer.add(OnlineSummarizer.java:57)
at
org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.mainToOutput(ValidateAdaptiveLogistic.java:107)
 at
org.apache.mahout.classifier.sgd.ValidateAdaptiveLogistic.main(ValidateAdaptiveLogistic.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
 at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

I'll try some other tests to see what's working and what's not.



2014-03-06 15:58 GMT+01:00 Gokhan Capan <gk...@gmail.com>:

> Kevin,
>
>
> From trunk, can you build mahout for hadoop2 using this command:
>
> mvn clean package -DskipTests=true -Dhadoop2.version=<YOUR_HADOOP2_VERSION>
>
>
> Then can you verify that you have the right hadoop jars with the following
> command:
>
> find . -name hadoop*.jar
>
>
>
> Gokhan
>
>
> On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <kevinmoulart@gmail.com
> >wrote:
>
> > Hi again, and thanks for the enthousiasm !
> >
> > I did compile the trunk with the hadoop2 profile and, althoug it didn't
> > work at first because of some Canopy tests not passing, when I skipped
> the
> > tests it compiled and when I tested it afterward it passed.
> > I used the version I have isntalled, so I just added the line :
> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> > To the pom.xml and type :
> > mvn -DskipTests clean install -Phadoop2
> > Then :
> > mvn test
> >
> > Then I tried it with these settings :
> > export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> > export
> >
> >
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
> >
> > And the command gives me this :
> > [root@node01 mahout9]# bin/mahout
> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> > MAHOUT-JOB:
> >
> >
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> >  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> >  at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> >  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > I even tried with :
> > export HADOOP_HOME=/.../hadoop,
> > export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> > export HADOOP_HOME=/.../hadoop-mapreduce
> >
> > And it still gives me the same result.
> >
> > And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
> >  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
> >
> > Any idea ?
> >
> >
> >
> > 2014-03-05 22:42 GMT+01:00 Andrew Musselman <andrew.musselman@gmail.com
> >:
> >
> > > I mean "balance the risk aversion against the value of new features"
> duh.
> > >
> > >
> > > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> > > andrew.musselman@gmail.com
> > > > wrote:
> > >
> > > > Yeah, for sure; balancing clients' risk aversion to technical
> features
> > is
> > > > why we often recommend vendor solutions.
> > > >
> > > > Having a little button to choose a newer version of a component in
> the
> > > > Manager UI (even with a confirmation dialog that said "Are you sure?
> > Are
> > > > you crazy?") would be more palatable to some teams than installing
> > > > tarballs, is what I'm getting at.
> > > >
> > > >
> > > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:
> > > >
> > > >> You can always install whatever version of anything on your cluster
> > > >> that you want. It may or may not work, but often happens to, at
> least
> > > >> for whatever you need it to do.
> > > >>
> > > >> It's just the same as it is without a packaged distribution -- dump
> > > >> new tarballs and cross your fingers. Nothing is weird or different
> > > >> about the setup or layout. That is the "here be dragons" solution,
> > > >> already
> > > >>
> > > >> You go with support from a packaged distribution when you want a
> "here
> > > >> be no dragons" solution. Everything else is by definition already
> > > >> something you can and should do yourself outside of a packaged
> > > >> distribution. And really -- you freely can, and it's not hard, if
> you
> > > >> know what you are doing.
> > > >>
> > > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> > > >> <an...@gmail.com> wrote:
> > > >> > Feels like just yesterday :)
> > > >> >
> > > >> > Consider this a feature request to have more flexible component
> > > >> versioning,
> > > >> > even with a caveat/"here be dragons" warning.  I know that
> > complicates
> > > >> > things but people do use your releases a long time.  I personally
> > > >> wished I
> > > >> > could upgrade Pig on CDH 4 for new features but there was no
> simple
> > > way
> > > >> on
> > > >> > a managed cluster.
> > > >> >
> > > >> >
> > > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com>
> > wrote:
> > > >> >
> > > >> >> I don't understand this -- CDH always bundles the latest release.
> > > >> >>
> > > >> >> You know that CDH4 was released in July 2012, right? So it
> included
> > > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> > > >> >> month after it began beta 2.
> > > >> >>
> > > >> >> CDH follows semantic versioning and won't introduce changes that
> > are
> > > >> >> not backwards-compatible in a minor version update. 0.x releases
> of
> > > >> >> Mahout act like major version changes -- not backwards
> compatible.
> > So
> > > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> > > >> >>
> > > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
> > dlieu.7@gmail.com>
> > > >> >> wrote:
> > > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com>
> > > wrote:
> > > >> >> >
> > > >> >> >> I don't follow what here makes you say they are "cut down"
> > > releases?
> > > >> >> >>
> > > >> >> >
> > > >> >> > meaning it seems to be pretty much 2 releases behind the
> > official.
> > > >> But i
> > > >> >> > definitely don't follow CDH developments in this department,
> you
> > > >> seem in
> > > >> >> a
> > > >> >> > better position to explain the existing patchlevel there so I
> > defer
> > > >> to
> > > >> >> you
> > > >> >> > to explain why this patchlevel is not there.
> > > >> >> >
> > > >> >> > -d
> > > >> >>
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Kévin Moulart
> > GSM France : +33 7 81 06 10 10
> > GSM Belgique : +32 473 85 23 85
> > Téléphone fixe : +32 2 771 88 45
> >
>



-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Gokhan Capan <gk...@gmail.com>.
Kevin,


>From trunk, can you build mahout for hadoop2 using this command:

mvn clean package -DskipTests=true -Dhadoop2.version=<YOUR_HADOOP2_VERSION>


Then can you verify that you have the right hadoop jars with the following
command:

find . -name hadoop*.jar



Gokhan


On Thu, Mar 6, 2014 at 3:26 PM, Kevin Moulart <ke...@gmail.com>wrote:

> Hi again, and thanks for the enthousiasm !
>
> I did compile the trunk with the hadoop2 profile and, althoug it didn't
> work at first because of some Canopy tests not passing, when I skipped the
> tests it compiled and when I tested it afterward it passed.
> I used the version I have isntalled, so I just added the line :
>     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> To the pom.xml and type :
> mvn -DskipTests clean install -Phadoop2
> Then :
> mvn test
>
> Then I tried it with these settings :
> export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> export
>
> HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> export MAHOUT_HOME=/home/myCompany/Downloads/mahout9
>
> And the command gives me this :
> [root@node01 mahout9]# bin/mahout
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB:
>
> /home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
>  at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> I even tried with :
> export HADOOP_HOME=/.../hadoop,
> export HADOOP_HOME=/.../hadoop-0.20-mapreduce
> export HADOOP_HOME=/.../hadoop-mapreduce
>
> And it still gives me the same result.
>
> And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
>  <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.
>
> Any idea ?
>
>
>
> 2014-03-05 22:42 GMT+01:00 Andrew Musselman <an...@gmail.com>:
>
> > I mean "balance the risk aversion against the value of new features" duh.
> >
> >
> > On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> > andrew.musselman@gmail.com
> > > wrote:
> >
> > > Yeah, for sure; balancing clients' risk aversion to technical features
> is
> > > why we often recommend vendor solutions.
> > >
> > > Having a little button to choose a newer version of a component in the
> > > Manager UI (even with a confirmation dialog that said "Are you sure?
> Are
> > > you crazy?") would be more palatable to some teams than installing
> > > tarballs, is what I'm getting at.
> > >
> > >
> > > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:
> > >
> > >> You can always install whatever version of anything on your cluster
> > >> that you want. It may or may not work, but often happens to, at least
> > >> for whatever you need it to do.
> > >>
> > >> It's just the same as it is without a packaged distribution -- dump
> > >> new tarballs and cross your fingers. Nothing is weird or different
> > >> about the setup or layout. That is the "here be dragons" solution,
> > >> already
> > >>
> > >> You go with support from a packaged distribution when you want a "here
> > >> be no dragons" solution. Everything else is by definition already
> > >> something you can and should do yourself outside of a packaged
> > >> distribution. And really -- you freely can, and it's not hard, if you
> > >> know what you are doing.
> > >>
> > >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> > >> <an...@gmail.com> wrote:
> > >> > Feels like just yesterday :)
> > >> >
> > >> > Consider this a feature request to have more flexible component
> > >> versioning,
> > >> > even with a caveat/"here be dragons" warning.  I know that
> complicates
> > >> > things but people do use your releases a long time.  I personally
> > >> wished I
> > >> > could upgrade Pig on CDH 4 for new features but there was no simple
> > way
> > >> on
> > >> > a managed cluster.
> > >> >
> > >> >
> > >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com>
> wrote:
> > >> >
> > >> >> I don't understand this -- CDH always bundles the latest release.
> > >> >>
> > >> >> You know that CDH4 was released in July 2012, right? So it included
> > >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> > >> >> month after it began beta 2.
> > >> >>
> > >> >> CDH follows semantic versioning and won't introduce changes that
> are
> > >> >> not backwards-compatible in a minor version update. 0.x releases of
> > >> >> Mahout act like major version changes -- not backwards compatible.
> So
> > >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> > >> >>
> > >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <
> dlieu.7@gmail.com>
> > >> >> wrote:
> > >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com>
> > wrote:
> > >> >> >
> > >> >> >> I don't follow what here makes you say they are "cut down"
> > releases?
> > >> >> >>
> > >> >> >
> > >> >> > meaning it seems to be pretty much 2 releases behind the
> official.
> > >> But i
> > >> >> > definitely don't follow CDH developments in this department, you
> > >> seem in
> > >> >> a
> > >> >> > better position to explain the existing patchlevel there so I
> defer
> > >> to
> > >> >> you
> > >> >> > to explain why this patchlevel is not there.
> > >> >> >
> > >> >> > -d
> > >> >>
> > >>
> > >
> > >
> >
>
>
>
> --
> Kévin Moulart
> GSM France : +33 7 81 06 10 10
> GSM Belgique : +32 473 85 23 85
> Téléphone fixe : +32 2 771 88 45
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Hi again, and thanks for the enthousiasm !

I did compile the trunk with the hadoop2 profile and, althoug it didn't
work at first because of some Canopy tests not passing, when I skipped the
tests it compiled and when I tested it afterward it passed.
I used the version I have isntalled, so I just added the line :
    <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
To the pom.xml and type :
mvn -DskipTests clean install -Phadoop2
Then :
mvn test

Then I tried it with these settings :
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export
HADOOP_CLASSPATH=/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
export MAHOUT_HOME=/home/myCompany/Downloads/mahout9

And the command gives me this :
[root@node01 mahout9]# bin/mahout
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/home/myCompany/Downloads/mahout9/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

I even tried with :
export HADOOP_HOME=/.../hadoop,
export HADOOP_HOME=/.../hadoop-0.20-mapreduce
export HADOOP_HOME=/.../hadoop-mapreduce

And it still gives me the same result.

And recompiling with  <hadoop2.version>2.0.0</hadoop2.version> or
 <hadoop2.version>2.0.0-mr1-cdh4.6.0</hadoop2.version> didn't work.

Any idea ?



2014-03-05 22:42 GMT+01:00 Andrew Musselman <an...@gmail.com>:

> I mean "balance the risk aversion against the value of new features" duh.
>
>
> On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <
> andrew.musselman@gmail.com
> > wrote:
>
> > Yeah, for sure; balancing clients' risk aversion to technical features is
> > why we often recommend vendor solutions.
> >
> > Having a little button to choose a newer version of a component in the
> > Manager UI (even with a confirmation dialog that said "Are you sure? Are
> > you crazy?") would be more palatable to some teams than installing
> > tarballs, is what I'm getting at.
> >
> >
> > On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> You can always install whatever version of anything on your cluster
> >> that you want. It may or may not work, but often happens to, at least
> >> for whatever you need it to do.
> >>
> >> It's just the same as it is without a packaged distribution -- dump
> >> new tarballs and cross your fingers. Nothing is weird or different
> >> about the setup or layout. That is the "here be dragons" solution,
> >> already
> >>
> >> You go with support from a packaged distribution when you want a "here
> >> be no dragons" solution. Everything else is by definition already
> >> something you can and should do yourself outside of a packaged
> >> distribution. And really -- you freely can, and it's not hard, if you
> >> know what you are doing.
> >>
> >> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> >> <an...@gmail.com> wrote:
> >> > Feels like just yesterday :)
> >> >
> >> > Consider this a feature request to have more flexible component
> >> versioning,
> >> > even with a caveat/"here be dragons" warning.  I know that complicates
> >> > things but people do use your releases a long time.  I personally
> >> wished I
> >> > could upgrade Pig on CDH 4 for new features but there was no simple
> way
> >> on
> >> > a managed cluster.
> >> >
> >> >
> >> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com> wrote:
> >> >
> >> >> I don't understand this -- CDH always bundles the latest release.
> >> >>
> >> >> You know that CDH4 was released in July 2012, right? So it included
> >> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> >> >> month after it began beta 2.
> >> >>
> >> >> CDH follows semantic versioning and won't introduce changes that are
> >> >> not backwards-compatible in a minor version update. 0.x releases of
> >> >> Mahout act like major version changes -- not backwards compatible. So
> >> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> >> >>
> >> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >> >> wrote:
> >> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com>
> wrote:
> >> >> >
> >> >> >> I don't follow what here makes you say they are "cut down"
> releases?
> >> >> >>
> >> >> >
> >> >> > meaning it seems to be pretty much 2 releases behind the official.
> >> But i
> >> >> > definitely don't follow CDH developments in this department, you
> >> seem in
> >> >> a
> >> >> > better position to explain the existing patchlevel there so I defer
> >> to
> >> >> you
> >> >> > to explain why this patchlevel is not there.
> >> >> >
> >> >> > -d
> >> >>
> >>
> >
> >
>



-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Andrew Musselman <an...@gmail.com>.
I mean "balance the risk aversion against the value of new features" duh.


On Wed, Mar 5, 2014 at 1:39 PM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> Yeah, for sure; balancing clients' risk aversion to technical features is
> why we often recommend vendor solutions.
>
> Having a little button to choose a newer version of a component in the
> Manager UI (even with a confirmation dialog that said "Are you sure? Are
> you crazy?") would be more palatable to some teams than installing
> tarballs, is what I'm getting at.
>
>
> On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> You can always install whatever version of anything on your cluster
>> that you want. It may or may not work, but often happens to, at least
>> for whatever you need it to do.
>>
>> It's just the same as it is without a packaged distribution -- dump
>> new tarballs and cross your fingers. Nothing is weird or different
>> about the setup or layout. That is the "here be dragons" solution,
>> already
>>
>> You go with support from a packaged distribution when you want a "here
>> be no dragons" solution. Everything else is by definition already
>> something you can and should do yourself outside of a packaged
>> distribution. And really -- you freely can, and it's not hard, if you
>> know what you are doing.
>>
>> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
>> <an...@gmail.com> wrote:
>> > Feels like just yesterday :)
>> >
>> > Consider this a feature request to have more flexible component
>> versioning,
>> > even with a caveat/"here be dragons" warning.  I know that complicates
>> > things but people do use your releases a long time.  I personally
>> wished I
>> > could upgrade Pig on CDH 4 for new features but there was no simple way
>> on
>> > a managed cluster.
>> >
>> >
>> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com> wrote:
>> >
>> >> I don't understand this -- CDH always bundles the latest release.
>> >>
>> >> You know that CDH4 was released in July 2012, right? So it included
>> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
>> >> month after it began beta 2.
>> >>
>> >> CDH follows semantic versioning and won't introduce changes that are
>> >> not backwards-compatible in a minor version update. 0.x releases of
>> >> Mahout act like major version changes -- not backwards compatible. So
>> >> 4.x will always be 0.7 and 5.x will always be 0.8.
>> >>
>> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> >> wrote:
>> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:
>> >> >
>> >> >> I don't follow what here makes you say they are "cut down" releases?
>> >> >>
>> >> >
>> >> > meaning it seems to be pretty much 2 releases behind the official.
>> But i
>> >> > definitely don't follow CDH developments in this department, you
>> seem in
>> >> a
>> >> > better position to explain the existing patchlevel there so I defer
>> to
>> >> you
>> >> > to explain why this patchlevel is not there.
>> >> >
>> >> > -d
>> >>
>>
>
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Andrew Musselman <an...@gmail.com>.
Yeah, for sure; balancing clients' risk aversion to technical features is
why we often recommend vendor solutions.

Having a little button to choose a newer version of a component in the
Manager UI (even with a confirmation dialog that said "Are you sure? Are
you crazy?") would be more palatable to some teams than installing
tarballs, is what I'm getting at.


On Wed, Mar 5, 2014 at 1:30 PM, Sean Owen <sr...@gmail.com> wrote:

> You can always install whatever version of anything on your cluster
> that you want. It may or may not work, but often happens to, at least
> for whatever you need it to do.
>
> It's just the same as it is without a packaged distribution -- dump
> new tarballs and cross your fingers. Nothing is weird or different
> about the setup or layout. That is the "here be dragons" solution,
> already
>
> You go with support from a packaged distribution when you want a "here
> be no dragons" solution. Everything else is by definition already
> something you can and should do yourself outside of a packaged
> distribution. And really -- you freely can, and it's not hard, if you
> know what you are doing.
>
> On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
> <an...@gmail.com> wrote:
> > Feels like just yesterday :)
> >
> > Consider this a feature request to have more flexible component
> versioning,
> > even with a caveat/"here be dragons" warning.  I know that complicates
> > things but people do use your releases a long time.  I personally wished
> I
> > could upgrade Pig on CDH 4 for new features but there was no simple way
> on
> > a managed cluster.
> >
> >
> > On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> I don't understand this -- CDH always bundles the latest release.
> >>
> >> You know that CDH4 was released in July 2012, right? So it included
> >> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> >> month after it began beta 2.
> >>
> >> CDH follows semantic versioning and won't introduce changes that are
> >> not backwards-compatible in a minor version update. 0.x releases of
> >> Mahout act like major version changes -- not backwards compatible. So
> >> 4.x will always be 0.7 and 5.x will always be 0.8.
> >>
> >> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com>
> >> wrote:
> >> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:
> >> >
> >> >> I don't follow what here makes you say they are "cut down" releases?
> >> >>
> >> >
> >> > meaning it seems to be pretty much 2 releases behind the official.
> But i
> >> > definitely don't follow CDH developments in this department, you seem
> in
> >> a
> >> > better position to explain the existing patchlevel there so I defer to
> >> you
> >> > to explain why this patchlevel is not there.
> >> >
> >> > -d
> >>
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
You can always install whatever version of anything on your cluster
that you want. It may or may not work, but often happens to, at least
for whatever you need it to do.

It's just the same as it is without a packaged distribution -- dump
new tarballs and cross your fingers. Nothing is weird or different
about the setup or layout. That is the "here be dragons" solution,
already

You go with support from a packaged distribution when you want a "here
be no dragons" solution. Everything else is by definition already
something you can and should do yourself outside of a packaged
distribution. And really -- you freely can, and it's not hard, if you
know what you are doing.

On Wed, Mar 5, 2014 at 9:15 PM, Andrew Musselman
<an...@gmail.com> wrote:
> Feels like just yesterday :)
>
> Consider this a feature request to have more flexible component versioning,
> even with a caveat/"here be dragons" warning.  I know that complicates
> things but people do use your releases a long time.  I personally wished I
> could upgrade Pig on CDH 4 for new features but there was no simple way on
> a managed cluster.
>
>
> On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com> wrote:
>
>> I don't understand this -- CDH always bundles the latest release.
>>
>> You know that CDH4 was released in July 2012, right? So it included
>> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
>> month after it began beta 2.
>>
>> CDH follows semantic versioning and won't introduce changes that are
>> not backwards-compatible in a minor version update. 0.x releases of
>> Mahout act like major version changes -- not backwards compatible. So
>> 4.x will always be 0.7 and 5.x will always be 0.8.
>>
>> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com>
>> wrote:
>> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:
>> >
>> >> I don't follow what here makes you say they are "cut down" releases?
>> >>
>> >
>> > meaning it seems to be pretty much 2 releases behind the official. But i
>> > definitely don't follow CDH developments in this department, you seem in
>> a
>> > better position to explain the existing patchlevel there so I defer to
>> you
>> > to explain why this patchlevel is not there.
>> >
>> > -d
>>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Andrew Musselman <an...@gmail.com>.
Feels like just yesterday :)

Consider this a feature request to have more flexible component versioning,
even with a caveat/"here be dragons" warning.  I know that complicates
things but people do use your releases a long time.  I personally wished I
could upgrade Pig on CDH 4 for new features but there was no simple way on
a managed cluster.


On Wed, Mar 5, 2014 at 12:12 PM, Sean Owen <sr...@gmail.com> wrote:

> I don't understand this -- CDH always bundles the latest release.
>
> You know that CDH4 was released in July 2012, right? So it included
> 0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
> month after it began beta 2.
>
> CDH follows semantic versioning and won't introduce changes that are
> not backwards-compatible in a minor version update. 0.x releases of
> Mahout act like major version changes -- not backwards compatible. So
> 4.x will always be 0.7 and 5.x will always be 0.8.
>
> On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com>
> wrote:
> > On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:
> >
> >> I don't follow what here makes you say they are "cut down" releases?
> >>
> >
> > meaning it seems to be pretty much 2 releases behind the official. But i
> > definitely don't follow CDH developments in this department, you seem in
> a
> > better position to explain the existing patchlevel there so I defer to
> you
> > to explain why this patchlevel is not there.
> >
> > -d
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
I don't understand this -- CDH always bundles the latest release.

You know that CDH4 was released in July 2012, right? So it included
0.7 + patches. CDH5 includes 0.8 because 0.9 was released about a
month after it began beta 2.

CDH follows semantic versioning and won't introduce changes that are
not backwards-compatible in a minor version update. 0.x releases of
Mahout act like major version changes -- not backwards compatible. So
4.x will always be 0.7 and 5.x will always be 0.8.

On Wed, Mar 5, 2014 at 5:34 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> I don't follow what here makes you say they are "cut down" releases?
>>
>
> meaning it seems to be pretty much 2 releases behind the official. But i
> definitely don't follow CDH developments in this department, you seem in a
> better position to explain the existing patchlevel there so I defer to you
> to explain why this patchlevel is not there.
>
> -d

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
On Wed, Mar 5, 2014 at 9:08 AM, Sean Owen <sr...@gmail.com> wrote:

> I don't follow what here makes you say they are "cut down" releases?
>

meaning it seems to be pretty much 2 releases behind the official. But i
definitely don't follow CDH developments in this department, you seem in a
better position to explain the existing patchlevel there so I defer to you
to explain why this patchlevel is not there.

-d

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
I don't follow what here makes you say they are "cut down" releases?
They are release plus patches not release minus patches.

The question is not about how to use 0.7, but how to use 1.0-SNAPSHOT.
Why would switching to the "official" 0.7 release help?

I think the answer is "you build Mahout for Hadoop 2". right? This has
always been the case. Mahout has always been Hadoop 1, with 2 support
"on the side".

On Wed, Mar 5, 2014 at 5:04 PM, Dmitriy Lyubimov <dl...@gmail.com> wrote:
> Yeah. it would seem CDH releases of Mahout produce some sort of cut-down
> version of such. I suggest to switch to official release tarbal (or write
> to Cloudera support about it).
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Dmitriy Lyubimov <dl...@gmail.com>.
Yeah. it would seem CDH releases of Mahout produce some sort of cut-down
version of such. I suggest to switch to official release tarbal (or write
to Cloudera support about it).


On Wed, Mar 5, 2014 at 8:38 AM, Andrew Musselman <andrew.musselman@gmail.com
> wrote:

> I'm not sure about this either but I think these are all the changes to
> Mahout in CDH 4.6.0:
> http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt
>
> MAHOUT-1291
>
> MAHOUT-1033
>
> MAHOUT-1142
>
>
>
> On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi <suneel_marthi@yahoo.com
> >wrote:
>
> > Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and
> M-1098
> > which address the issues u r seeing.
> >
> >
> >
> > The second part of the issue u r seeing with Mahout 0.9 distro seems to
> be
> > related to how u set it up on CDH4. I apologize for not being helpful
> here
> > as I am not a CDH4 user or expert.
> >
> > Sean?
> >
> >
> >
> >
> > On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart <
> > kevinmoulart@gmail.com> wrote:
> >
> > Previous mail sent only to Suneel : (my bad sorry)
> >
> > According to my stacktrace it seems that I am running mahout 0.7 indeed.
> > > That's the version provided by Cloudera when I install mahout using
> yum.
> > > But according to Sean Owen, it really is a 0.8 inside...
> > > Anyway I tried with the compiled version and it didn't work :
> > > Running on hadoop, using
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > > and HADOOP_CONF_DIR=
> > > Exception in thread "main" java.lang.NoSuchMethodError:
> > > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> > >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >  at
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > > at
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > >  at java.lang.reflect.Method.invoke(Method.java:606)
> > > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > MAHOUT-JOB:
> > >
> /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
> > >
> >
> > And now I changed the conf directory of mahout 0.9 to be linked to the
> one
> > used by the existing working mahout and the trace changes :
> >
> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> > MAHOUT-JOB:
> >
> >
> /home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
> > 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> > org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
> > java.lang.ClassNotFoundException:
> > org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:190)
> > at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> > org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
> > java.lang.ClassNotFoundException:
> > org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:190)
> > at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> > org.apache.mahout.clustering.minhash.MinHashDriver
> > java.lang.ClassNotFoundException:
> > org.apache.mahout.clustering.minhash.MinHashDriver
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:190)
> > at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> > org.apache.mahout.clustering.dirichlet.DirichletDriver
> > java.lang.ClassNotFoundException:
> > org.apache.mahout.clustering.dirichlet.DirichletDriver
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:190)
> > at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> >
> > Changing the hadoop home to
> > /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-mapreduce doesn't change
> > the output, nor does
> > /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-0.20-mapreduce
> >
> > Any idea now ?
> >
> >
> >
> > 2014-03-05 15:45 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
> >
> > Are u using Mahout 0.7 ?
> > >
> > > From this line in ur stacktrace that seems to be the case:
> > > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> > >
> > > You could build Mahout outside of CDH from Mahout trunk and put the
> jars
> > > onto CDH5.
> > > I am no Cloudera expert or CDH5 user to help with CDHx build.
> > >
> > >
> > >
> > >
> > >
> > >
> > >   On Wednesday, March 5, 2014 9:30 AM, Kevin Moulart <
> > > kevinmoulart@gmail.com> wrote:
> > >  Hi and thanks for your help!
> > >
> > > I had been told that the version of mahout used by Cloudera (CDH 4.6)
> was
> > > in fact 0.8 with a patch for mr2 support.
> > > (
> > >
> >
> http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E
> > )
> > >
> > > But I tried to install 0.9 on my own, by compiling it with mvn after I
> > > changed the pom.xml :
> > >
> > > - Added cloudera repository :
> > >
> > >     <repository>
> > >       <id>cloudera-repo</id>
> > >       <name>Cloudera Repository</name>
> > >        <url>https://repository.cloudera.com/artifactory/cloudera-repos
> > > </url>
> > >     </repository>
> > >
> > > - Changed the version of hadoop to use :
> > >     <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>
> > > - I tried adding this one too :
> > >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> > >
> > > But then I get a lot of errors when Maven begins to compile the core
> > > package :
> > > https://gist.github.com/kmoulart/9368193
> > >
> > > Could you tell me what I did wrong ?
> > >
> > >
> > > 2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
> > >
> > > The -us option was fixed for Mahout 0.8, seems like u r using Mahout
> 0.7
> > > which had this issue (from ur stacktrace, its apparent u r using Mahout
> > > 0.7).  Please upgrade to the latest mahout version.
> > >
> > >
> > >
> > >
> > >
> > > On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <
> kevinmoulart@gmail.com
> > >
> > > wrote:
> > >
> > > Hi,
> > >
> > > I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> > > columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
> > and
> > > I always get a StackOverflowError :
> > >
> > > Here is my command line :
> > > mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
> > 100
> > > -pca "true" -U "false" -V "false" -t 3 -ow
> > >
> > > I also tried to put "-us true" as mentionned in
> > >
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> > > the option is not available anymore.
> > >
> > > The output of the previous command is :
> > > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > > Running on hadoop, using
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > > and HADOOP_CONF_DIR=/etc/hadoop/conf
> > > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> > > 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> > > {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> > > --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> > > --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> > > --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> > > --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> > > --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> > > --uHalfSigma=[false], --vHalfSigma=[false]}
> > > Exception in thread "main" java.lang.StackOverflowError
> > > at
> > >
> > >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > > at
> > >
> > >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > > at
> > >
> > >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > > ...
> > >
> > > I search online and didn't find a solution to my problem.
> > >
> > > Can you help me ?
> > >
> > > Thanks in advance,
> > >
> > > --
> > > Kévin Moulart
> > >
> > >
> > >
> > >
> > > --
> > > Kévin Moulart
> > > GSM France : +33 7 81 06 10 10
> > > GSM Belgique : +32 473 85 23 85
> > > Téléphone fixe : +32 2 771 88 45
> > >
> > >
> > >
> >
> >
> > --
> > Kévin Moulart
> > GSM France : +33 7 81 06 10 10
> > GSM Belgique : +32 473 85 23 85
> > Téléphone fixe : +32 2 771 88 45
> >
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Andrew Musselman <an...@gmail.com>.
I'm not sure about this either but I think these are all the changes to
Mahout in CDH 4.6.0:
http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt

MAHOUT-1291

MAHOUT-1033

MAHOUT-1142



On Wed, Mar 5, 2014 at 8:30 AM, Suneel Marthi <su...@yahoo.com>wrote:

> Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098
> which address the issues u r seeing.
>
>
>
> The second part of the issue u r seeing with Mahout 0.9 distro seems to be
> related to how u set it up on CDH4. I apologize for not being helpful here
> as I am not a CDH4 user or expert.
>
> Sean?
>
>
>
>
> On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart <
> kevinmoulart@gmail.com> wrote:
>
> Previous mail sent only to Suneel : (my bad sorry)
>
> According to my stacktrace it seems that I am running mahout 0.7 indeed.
> > That's the version provided by Cloudera when I install mahout using yum.
> > But according to Sean Owen, it really is a 0.8 inside...
> > Anyway I tried with the compiled version and it didn't work :
> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > and HADOOP_CONF_DIR=
> > Exception in thread "main" java.lang.NoSuchMethodError:
> > org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> >  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >  at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> >  at java.lang.reflect.Method.invoke(Method.java:606)
> > at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> MAHOUT-JOB:
> > /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
> >
>
> And now I changed the conf directory of mahout 0.9 to be linked to the one
> used by the existing working mahout and the trace changes :
>
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB:
>
> /home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
> 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> org.apache.mahout.clustering.minhash.MinHashDriver
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.minhash.MinHashDriver
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> 14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
> org.apache.mahout.clustering.dirichlet.DirichletDriver
> java.lang.ClassNotFoundException:
> org.apache.mahout.clustering.dirichlet.DirichletDriver
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
> Changing the hadoop home to
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-mapreduce doesn't change
> the output, nor does
> /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-0.20-mapreduce
>
> Any idea now ?
>
>
>
> 2014-03-05 15:45 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
>
> Are u using Mahout 0.7 ?
> >
> > From this line in ur stacktrace that seems to be the case:
> > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> >
> > You could build Mahout outside of CDH from Mahout trunk and put the jars
> > onto CDH5.
> > I am no Cloudera expert or CDH5 user to help with CDHx build.
> >
> >
> >
> >
> >
> >
> >   On Wednesday, March 5, 2014 9:30 AM, Kevin Moulart <
> > kevinmoulart@gmail.com> wrote:
> >  Hi and thanks for your help!
> >
> > I had been told that the version of mahout used by Cloudera (CDH 4.6) was
> > in fact 0.8 with a patch for mr2 support.
> > (
> >
> http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E
> )
> >
> > But I tried to install 0.9 on my own, by compiling it with mvn after I
> > changed the pom.xml :
> >
> > - Added cloudera repository :
> >
> >     <repository>
> >       <id>cloudera-repo</id>
> >       <name>Cloudera Repository</name>
> >        <url>https://repository.cloudera.com/artifactory/cloudera-repos
> > </url>
> >     </repository>
> >
> > - Changed the version of hadoop to use :
> >     <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>
> > - I tried adding this one too :
> >     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
> >
> > But then I get a lot of errors when Maven begins to compile the core
> > package :
> > https://gist.github.com/kmoulart/9368193
> >
> > Could you tell me what I did wrong ?
> >
> >
> > 2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
> >
> > The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7
> > which had this issue (from ur stacktrace, its apparent u r using Mahout
> > 0.7).  Please upgrade to the latest mahout version.
> >
> >
> >
> >
> >
> > On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <kevinmoulart@gmail.com
> >
> > wrote:
> >
> > Hi,
> >
> > I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> > columns and 100.000 to 30.000.000 lines using ssvd with the pca option,
> and
> > I always get a StackOverflowError :
> >
> > Here is my command line :
> > mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k
> 100
> > -pca "true" -U "false" -V "false" -t 3 -ow
> >
> > I also tried to put "-us true" as mentionned in
> >
> >
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> > the option is not available anymore.
> >
> > The output of the previous command is :
> > MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> > Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> > and HADOOP_CONF_DIR=/etc/hadoop/conf
> > MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> > 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> > {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> > --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> > --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> > --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> > --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> > --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> > --uHalfSigma=[false], --vHalfSigma=[false]}
> > Exception in thread "main" java.lang.StackOverflowError
> > at
> >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > at
> >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > at
> >
> >
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> > ...
> >
> > I search online and didn't find a solution to my problem.
> >
> > Can you help me ?
> >
> > Thanks in advance,
> >
> > --
> > Kévin Moulart
> >
> >
> >
> >
> > --
> > Kévin Moulart
> > GSM France : +33 7 81 06 10 10
> > GSM Belgique : +32 473 85 23 85
> > Téléphone fixe : +32 2 771 88 45
> >
> >
> >
>
>
> --
> Kévin Moulart
> GSM France : +33 7 81 06 10 10
> GSM Belgique : +32 473 85 23 85
> Téléphone fixe : +32 2 771 88 45
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Suneel Marthi <su...@yahoo.com>.
I apologize Sean I wasn't aware of the complete history in this thread.  I didn't know about Hadoop 2.x being involved here, if so yes need to build Mahout against HEAD with Hadoop 2 profile to get working.







On Wednesday, March 5, 2014 12:04 PM, Sean Owen <sr...@gmail.com> wrote:
 
CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it
has (tiny) breaking changes vs 0.7 and this is a minor version update.
CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the
message of mine that was quoted.

http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.5.0.CHANGES.txt
http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt

Those two patches are not in CDH 4.x, no.

The non-upstream changes are basically all internal packaging stuff,
and that can include modifying dependency versions to harmonize with
the rest of the platform. That's the sense in which it works with
Hadoop 2.

I don't think the change you cite is sufficient to work with Hadoop 2.
You also, for example, must build against the Hadoop 2 profile in
Mahout in Maven. For that you do not need the CDH repo even, just
point to the Hadoop 2.x release if you like.

I know there has been a patch in even just the past few weeks that
makes it work even better with 2.x. So I suppose I would build from
HEAD if possible to take advantage.


On Wed, Mar 5, 2014 at 4:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
> Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing.
>
>
>
> The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert.
>
> Sean?
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Sean Owen <sr...@gmail.com>.
CDH 4.5 and 4.6 are both 0.7 + patches. Neither contains 0.8, since it
has (tiny) breaking changes vs 0.7 and this is a minor version update.
CDH5 contains 0.8 + patches. I did not say CDH4 has 0.8 -- re-read the
message of mine that was quoted.

http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.5.0.CHANGES.txt
http://archive.cloudera.com/cdh4/cdh/4/mahout-0.7-cdh4.6.0.CHANGES.txt

Those two patches are not in CDH 4.x, no.

The non-upstream changes are basically all internal packaging stuff,
and that can include modifying dependency versions to harmonize with
the rest of the platform. That's the sense in which it works with
Hadoop 2.

I don't think the change you cite is sufficient to work with Hadoop 2.
You also, for example, must build against the Hadoop 2 profile in
Mahout in Maven. For that you do not need the CDH repo even, just
point to the Hadoop 2.x release if you like.

I know there has been a patch in even just the past few weeks that
makes it work even better with 2.x. So I suppose I would build from
HEAD if possible to take advantage.

On Wed, Mar 5, 2014 at 4:30 PM, Suneel Marthi <su...@yahoo.com> wrote:
> Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing.
>
>
>
> The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert.
>
> Sean?
>

Re: Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Suneel Marthi <su...@yahoo.com>.
Not sure if the CDH4 patches on top of 0.7 has fixes for M-1067 and M-1098 which address the issues u r seeing.



The second part of the issue u r seeing with Mahout 0.9 distro seems to be related to how u set it up on CDH4. I apologize for not being helpful here as I am not a CDH4 user or expert.

Sean?




On Wednesday, March 5, 2014 10:23 AM, Kevin Moulart <ke...@gmail.com> wrote:
 
Previous mail sent only to Suneel : (my bad sorry)

According to my stacktrace it seems that I am running mahout 0.7 indeed.
> That's the version provided by Cloudera when I install mahout using yum.
> But according to Sean Owen, it really is a 0.8 inside...
> Anyway I tried with the compiled version and it didn't work :
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

MAHOUT-JOB:
> /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
>

And now I changed the conf directory of mahout 0.9 to be linked to the one
used by the existing working mahout and the trace changes :

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.minhash.MinHashDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.minhash.MinHashDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.dirichlet.DirichletDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.dirichlet.DirichletDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Changing the hadoop home to
/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-mapreduce doesn't change
the output, nor does
/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-0.20-mapreduce

Any idea now ?



2014-03-05 15:45 GMT+01:00 Suneel Marthi <su...@yahoo.com>:

Are u using Mahout 0.7 ?
>
> From this line in ur stacktrace that seems to be the case:
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>
> You could build Mahout outside of CDH from Mahout trunk and put the jars
> onto CDH5.
> I am no Cloudera expert or CDH5 user to help with CDHx build.
>
>
>
>
>
>
>   On Wednesday, March 5, 2014 9:30 AM, Kevin Moulart <
> kevinmoulart@gmail.com> wrote:
>  Hi and thanks for your help!
>
> I had been told that the version of mahout used by Cloudera (CDH 4.6) was
> in fact 0.8 with a patch for mr2 support.
> (
> http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E)
>
> But I tried to install 0.9 on my own, by compiling it with mvn after I
> changed the pom.xml :
>
> - Added cloudera repository :
>
>     <repository>
>       <id>cloudera-repo</id>
>       <name>Cloudera Repository</name>
>        <url>https://repository.cloudera.com/artifactory/cloudera-repos
> </url>
>     </repository>
>
> - Changed the version of hadoop to use :
>     <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>
> - I tried adding this one too :
>     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
>
> But then I get a lot of errors when Maven begins to compile the core
> package :
> https://gist.github.com/kmoulart/9368193
>
> Could you tell me what I did wrong ?
>
>
> 2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
>
> The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7
> which had this issue (from ur stacktrace, its apparent u r using Mahout
> 0.7).  Please upgrade to the latest mahout version.
>
>
>
>
>
> On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <ke...@gmail.com>
> wrote:
>
> Hi,
>
> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
> I always get a StackOverflowError :
>
> Here is my command line :
> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
> -pca "true" -U "false" -V "false" -t 3 -ow
>
> I also tried to put "-us true" as mentionned in
>
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> the option is not available anymore.
>
> The output of the previous command is :
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> --uHalfSigma=[false], --vHalfSigma=[false]}
> Exception in thread "main" java.lang.StackOverflowError
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> ...
>
> I search online and didn't find a solution to my problem.
>
> Can you help me ?
>
> Thanks in advance,
>
> --
> Kévin Moulart
>
>
>
>
> --
> Kévin Moulart
> GSM France : +33 7 81 06 10 10
> GSM Belgique : +32 473 85 23 85
> Téléphone fixe : +32 2 771 88 45
>
>
>


-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Fwd: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Previous mail sent only to Suneel : (my bad sorry)

According to my stacktrace it seems that I am running mahout 0.7 indeed.
> That's the version provided by Cloudera when I install mahout using yum.
> But according to Sean Owen, it really is a 0.8 inside...
> Anyway I tried with the compiled version and it didn't work :
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=
> Exception in thread "main" java.lang.NoSuchMethodError:
> org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
>  at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:606)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

MAHOUT-JOB:
> /home/cacf/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
>

And now I changed the conf directory of mahout 0.9 to be linked to the one
used by the existing working mahout and the trace changes :

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB:
/home/myCompany/Downloads/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.meanshift.MeanShiftCanopyDriver
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.spectral.eigencuts.EigencutsDriver
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.minhash.MinHashDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.minhash.MinHashDriver
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
14/03/05 16:16:23 WARN driver.MahoutDriver: Unable to add class:
org.apache.mahout.clustering.dirichlet.DirichletDriver
java.lang.ClassNotFoundException:
org.apache.mahout.clustering.dirichlet.DirichletDriver
 at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
 at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at org.apache.mahout.driver.MahoutDriver.addClass(MahoutDriver.java:237)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:118)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Exception in thread "main" java.lang.NoSuchMethodError:
org.apache.hadoop.util.ProgramDriver.driver([Ljava/lang/String;)V
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:122)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

Changing the hadoop home to
/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-mapreduce doesn't change
the output, nor does
/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop-0.20-mapreduce

Any idea now ?


2014-03-05 15:45 GMT+01:00 Suneel Marthi <su...@yahoo.com>:

 Are u using Mahout 0.7 ?
>
> From this line in ur stacktrace that seems to be the case:
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>
> You could build Mahout outside of CDH from Mahout trunk and put the jars
> onto CDH5.
> I am no Cloudera expert or CDH5 user to help with CDHx build.
>
>
>
>
>
>
>   On Wednesday, March 5, 2014 9:30 AM, Kevin Moulart <
> kevinmoulart@gmail.com> wrote:
>  Hi and thanks for your help!
>
> I had been told that the version of mahout used by Cloudera (CDH 4.6) was
> in fact 0.8 with a patch for mr2 support.
> (
> http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E)
>
> But I tried to install 0.9 on my own, by compiling it with mvn after I
> changed the pom.xml :
>
> - Added cloudera repository :
>
>     <repository>
>       <id>cloudera-repo</id>
>       <name>Cloudera Repository</name>
>        <url>https://repository.cloudera.com/artifactory/cloudera-repos
> </url>
>     </repository>
>
> - Changed the version of hadoop to use :
>     <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>
> - I tried adding this one too :
>     <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>
>
> But then I get a lot of errors when Maven begins to compile the core
> package :
> https://gist.github.com/kmoulart/9368193
>
> Could you tell me what I did wrong ?
>
>
> 2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:
>
> The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7
> which had this issue (from ur stacktrace, its apparent u r using Mahout
> 0.7).  Please upgrade to the latest mahout version.
>
>
>
>
>
> On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <ke...@gmail.com>
> wrote:
>
> Hi,
>
> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
> I always get a StackOverflowError :
>
> Here is my command line :
> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
> -pca "true" -U "false" -V "false" -t 3 -ow
>
> I also tried to put "-us true" as mentionned in
>
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> the option is not available anymore.
>
> The output of the previous command is :
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> --uHalfSigma=[false], --vHalfSigma=[false]}
> Exception in thread "main" java.lang.StackOverflowError
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> ...
>
> I search online and didn't find a solution to my problem.
>
> Can you help me ?
>
> Thanks in advance,
>
> --
> Kévin Moulart
>
>
>
>
> --
> Kévin Moulart
> GSM France : +33 7 81 06 10 10
> GSM Belgique : +32 473 85 23 85
> Téléphone fixe : +32 2 771 88 45
>
>
>


-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Re: PCA with ssvd leads to StackOverFlowError

Posted by Suneel Marthi <su...@yahoo.com>.
Are u using Mahout 0.7 ?

From this line in ur stacktrace that seems to be the case:
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar

You could build Mahout outside of CDH from Mahout trunk and put the jars onto CDH5.
I am no Cloudera expert or CDH5 user to help with CDHx build.







On Wednesday, March 5, 2014 9:30 AM, Kevin Moulart <ke...@gmail.com> wrote:
 
Hi and thanks for your help!

I had been told that the version of mahout used by Cloudera (CDH 4.6) was in fact 0.8 with a patch for mr2 support.
( http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E )

But I tried to install 0.9 on my own, by compiling it with mvn after I changed the pom.xml :

- Added cloudera repository :


    <repository>
      <id>cloudera-repo</id>
      <name>Cloudera Repository</name>
      <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    </repository>

- Changed the version of hadoop to use :
    <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>

- I tried adding this one too :
    <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>

But then I get a lot of errors when Maven begins to compile the core package :
https://gist.github.com/kmoulart/9368193


Could you tell me what I did wrong ?



2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:

The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7).  Please upgrade to the latest mahout version.
>
>
>
>
>
>
>On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <ke...@gmail.com> wrote:
>
>Hi,
>
>I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
>columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
>I always get a StackOverflowError :
>
>Here is my command line :
>mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
>-pca "true" -U "false" -V "false" -t 3 -ow
>
>I also tried to put "-us true" as mentionned in
>https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
>the option is not available anymore.
>
>The output of the previous command is :
>MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
>Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
>and HADOOP_CONF_DIR=/etc/hadoop/conf
>MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
>14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
>{--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
>--computeU=[false], --computeV=[false], --endPhase=[2147483647],
>--input=[/user/myUser/Echant100k], --minSplitSize=[-1],
>--outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
>--oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
>--rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
>--uHalfSigma=[false], --vHalfSigma=[false]}
>Exception in thread "main" java.lang.StackOverflowError
>at
>org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>at
>org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>at
>org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
>...
>
>I search online and didn't find a solution to my problem.
>
>Can you help me ?
>
>Thanks in advance,
>
>--
>Kévin Moulart


-- 

Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Re: PCA with ssvd leads to StackOverFlowError

Posted by Kevin Moulart <ke...@gmail.com>.
Hi and thanks for your help!

I had been told that the version of mahout used by Cloudera (CDH 4.6) was
in fact 0.8 with a patch for mr2 support.
(
http://mail-archives.apache.org/mod_mbox/mahout-user/201402.mbox/%3CCAEccTywqSAKA_HeX4vTZ-5XPmKtj5b8zMGQUfn5qRsiq=7o=ug@mail.gmail.com%3E)

But I tried to install 0.9 on my own, by compiling it with mvn after I
changed the pom.xml :

- Added cloudera repository :

    <repository>
      <id>cloudera-repo</id>
      <name>Cloudera Repository</name>
       <url>https://repository.cloudera.com/artifactory/cloudera-repos</url>
    </repository>

- Changed the version of hadoop to use :
    <hadoop.1.version>2.0.0-mr1-cdh4.6.0</hadoop.1.version>
- I tried adding this one too :
    <hadoop2.version>2.0.0-cdh4.6.0</hadoop2.version>

But then I get a lot of errors when Maven begins to compile the core
package :
https://gist.github.com/kmoulart/9368193

Could you tell me what I did wrong ?


2014-03-04 19:02 GMT+01:00 Suneel Marthi <su...@yahoo.com>:

> The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7
> which had this issue (from ur stacktrace, its apparent u r using Mahout
> 0.7).  Please upgrade to the latest mahout version.
>
>
>
>
>
> On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <ke...@gmail.com>
> wrote:
>
> Hi,
>
> I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
> columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
> I always get a StackOverflowError :
>
> Here is my command line :
> mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
> -pca "true" -U "false" -V "false" -t 3 -ow
>
> I also tried to put "-us true" as mentionned in
>
> https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
> the option is not available anymore.
>
> The output of the previous command is :
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
> and HADOOP_CONF_DIR=/etc/hadoop/conf
> MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
> 14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
> {--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
> --computeU=[false], --computeV=[false], --endPhase=[2147483647],
> --input=[/user/myUser/Echant100k], --minSplitSize=[-1],
> --outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
> --oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
> --rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
> --uHalfSigma=[false], --vHalfSigma=[false]}
> Exception in thread "main" java.lang.StackOverflowError
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> at
>
> org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
> ...
>
> I search online and didn't find a solution to my problem.
>
> Can you help me ?
>
> Thanks in advance,
>
> --
> Kévin Moulart
>



-- 
Kévin Moulart
GSM France : +33 7 81 06 10 10
GSM Belgique : +32 473 85 23 85
Téléphone fixe : +32 2 771 88 45

Re: PCA with ssvd leads to StackOverFlowError

Posted by Suneel Marthi <su...@yahoo.com>.
The -us option was fixed for Mahout 0.8, seems like u r using Mahout 0.7 which had this issue (from ur stacktrace, its apparent u r using Mahout 0.7).  Please upgrade to the latest mahout version.





On Tuesday, March 4, 2014 8:54 AM, Kevin Moulart <ke...@gmail.com> wrote:
 
Hi,

I'm trying to apply a PCA to reduce the dimension of a matrix of 1603
columns and 100.000 to 30.000.000 lines using ssvd with the pca option, and
I always get a StackOverflowError :

Here is my command line :
mahout ssvd -i /user/myUser/Echant100k -o /user/myUser/Echant/SVD100 -k 100
-pca "true" -U "false" -V "false" -t 3 -ow

I also tried to put "-us true" as mentionned in
https://cwiki.apache.org/confluence/download/attachments/27832158/SSVD-CLI.pdf?version=18&modificationDate=1381347063000&api=v2but
the option is not available anymore.

The output of the previous command is :
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop
and HADOOP_CONF_DIR=/etc/hadoop/conf
MAHOUT-JOB: /usr/lib/mahout/mahout-examples-0.7-cdh4.5.0-job.jar
14/03/04 14:45:16 INFO common.AbstractJob: Command line arguments:
{--abtBlockHeight=[200000], --blockHeight=[10000], --broadcast=[true],
--computeU=[false], --computeV=[false], --endPhase=[2147483647],
--input=[/user/myUser/Echant100k], --minSplitSize=[-1],
--outerProdBlockHeight=[30000], --output=[/user/myUser/Echant/SVD100],
--oversampling=[15], --overwrite=null, --pca=[true], --powerIter=[0],
--rank=[100], --reduceTasks=[3], --startPhase=[0], --tempDir=[temp],
--uHalfSigma=[false], --vHalfSigma=[false]}
Exception in thread "main" java.lang.StackOverflowError
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
at
org.apache.mahout.math.hadoop.MatrixColumnMeansJob.run(MatrixColumnMeansJob.java:55)
...

I search online and didn't find a solution to my problem.

Can you help me ?

Thanks in advance,

-- 
Kévin Moulart